Python codes for Lite Audio-Visual Speech Enhancement.

Last update: Dec 01, 2022

Related tags

Deep Learning LAVSE

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE).

We have also put some preprocessed sample data (including enhanced results) in this repository.

The dataset of TMSV (Taiwan Mandarin speech with video) used in LAVSE is released here.

Please cite the following paper if you find the codes useful in your research.

@inproceedings{chuang2020lite,
  title={Lite Audio-Visual Speech Enhancement},
  author={Chuang, Shang-Yi and Tsao, Yu and Lo, Chen-Chou and Wang, Hsin-Min},
  booktitle={Proc. Interspeech 2020}
}

Prerequisites

Ubuntu 18.04
Python 3.6
CUDA 10

You can use pip to install Python depedencies.

pip install -r requirements.txt

Usage

You can simply enter the command below and the average PESQ and STOI results will show on your terminal pane.

Remember to activate visdom (probably in a screen or tmux) for recording the training loss before bashing the script.

bash run.sh

Go check run.sh if you need further information about the command lines.

License

The LAVSE work is released under MIT License.

See LICENSE for more details.

Acknowledgments

Bio-ASP Lab, CITI, Academia Sinica, Taipei, Taiwan
SLAM Lab, IIS, Academia Sinica, Taipei, Taiwan

Python codes for Lite Audio-Visual Speech Enhancement.

Related tags

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

Prerequisites

Usage

License

Acknowledgments

Owner

Shang-Yi Chuang

Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Powerful and efficient Computer Vision Annotation Tool (CVAT)

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

Identifying Stroke Indicators Using Rough Sets

Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper

MoveNetを用いたPythonでの姿勢推定のデモ

Neural Scene Flow Fields using pytorch-lightning, with potential improvements

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Transformer in Vision

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems"

The AWS Certified SysOps Administrator

Cervix ROI Segmentation Using U-NET

EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

Algorithm to texture 3D reconstructions from multi-view stereo images

Back to Basics: Efficient Network Compression via IMP

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"