Python codes for Lite Audio-Visual Speech Enhancement.

Last update: Dec 01, 2022

Related tags

Deep Learning LAVSE

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE).

We have also put some preprocessed sample data (including enhanced results) in this repository.

The dataset of TMSV (Taiwan Mandarin speech with video) used in LAVSE is released here.

Please cite the following paper if you find the codes useful in your research.

@inproceedings{chuang2020lite,
  title={Lite Audio-Visual Speech Enhancement},
  author={Chuang, Shang-Yi and Tsao, Yu and Lo, Chen-Chou and Wang, Hsin-Min},
  booktitle={Proc. Interspeech 2020}
}

Prerequisites

Ubuntu 18.04
Python 3.6
CUDA 10

You can use pip to install Python depedencies.

pip install -r requirements.txt

Usage

You can simply enter the command below and the average PESQ and STOI results will show on your terminal pane.

Remember to activate visdom (probably in a screen or tmux) for recording the training loss before bashing the script.

bash run.sh

Go check run.sh if you need further information about the command lines.

License

The LAVSE work is released under MIT License.

See LICENSE for more details.

Acknowledgments

Bio-ASP Lab, CITI, Academia Sinica, Taipei, Taiwan
SLAM Lab, IIS, Academia Sinica, Taipei, Taiwan

Python codes for Lite Audio-Visual Speech Enhancement.

Related tags

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

Prerequisites

Usage

License

Acknowledgments

Owner

Shang-Yi Chuang

Explainable Medical ImageSegmentation via GenerativeAdversarial Networks andLayer-wise Relevance Propagation

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Official code repository for the EMNLP 2021 paper

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Out of Distribution Detection on Natural Adversarial Examples

Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

A Python framework for conversational search

🔊 Audio and fastai v2

Pytorch implementation of the paper "Optimization as a Model for Few-Shot Learning"

pq is a jq-like Pickle file viewer

tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

Repository for publicly available deep learning models developed in Rosetta community

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

Full Stack Deep Learning Labs

Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

Implementation of MA-Trace - a general-purpose multi-agent RL algorithm for cooperative environments.

Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework