Pseudo-Visual Speech Denoising

Overview

Pseudo-Visual Speech Denoising

This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021.
Authors: Sindhu Hegde*, K R Prajwal*, Rudrabha Mukhopadhyay*, Vinay Namboodiri, C.V. Jawahar

PWC PWC

πŸ“ Paper πŸ“‘ Project Page πŸ›  Demo Video πŸ—ƒ Real-World Test Set
Paper Website Video Real-World Test Set (coming soon)


Features

  • Denoise any real-world audio/video and obtain the clean speech.
  • Works in unconstrained settings for any speaker in any language.
  • Inputs only audio but uses the benefits of lip movements by generating a synthetic visual stream.
  • Complete training code and inference codes available.

Prerequisites

  • Python 3.7.4 (Code has been tested with this version)
  • ffmpeg: sudo apt-get install ffmpeg
  • Install necessary packages using pip install -r requirements.txt
  • Face detection pre-trained model should be downloaded to face_detection/detection/sfd/s3fd.pth

Getting the weights

Model Description Link to the model
Denoising model Weights of the denoising model (needed for inference) Link
Lipsync student Weights of the student lipsync model to generate the visual stream for noisy audio inputs (needed for inference) Link
Wav2Lip teacher Weights of the teacher lipsync model (only needed if you want to train the network from scratch) Link

Denoising any audio/video using the pre-trained model (Inference)

You can denoise any noisy audio/video and obtain the clean speech of the target speaker using:

python inference.py --lipsync_student_model_path= --checkpoint_path= --input=

The result is saved (by default) in results/result.mp4. The result directory can be specified in arguments, similar to several other available options. The input file can be any audio file: *.wav, *.mp3 or even a video file, from which the code will automatically extract the audio and generate the clean speech. Note that the noise should not be human speech, as this work only tackles the denoising task, not speaker separation.

Generating only the lip-movements for any given noisy audio/video

The synthetic visual stream (lip-movements) can be generated for any noisy audio/video using:

cd lipsync
python inference.py --checkpoint_path= --audio=

The result is saved (by default) in results/result_voice.mp4. The result directory can be specified in arguments, similar to several other available options. The input file can be any audio file: *.wav, *.mp3 or even a video file, from which the code will automatically extract the audio and generate the visual stream.

Training

We illustrate the training process using the LRS3 and VGGSound dataset. Adapting for other datasets would involve small modifications to the code.

Preprocess the dataset

LRS3 train-val/pre-train dataset folder structure
data_root (we use both train-val and pre-train sets of LSR3 dataset in this work)
β”œβ”€β”€ list of folders
β”‚   β”œβ”€β”€ five-digit numbered video IDs ending with (.mp4)
Preprocess the dataset
python preprocess.py --data_root= --preprocessed_root=

Additional options like batch_size and number of GPUs to use in parallel to use can also be set.

Preprocessed LRS3 folder structure
preprocessed_root (lrs3_preprocessed)
β”œβ”€β”€ list of folders
|	β”œβ”€β”€ Folders with five-digit numbered video IDs
|	β”‚   β”œβ”€β”€ *.jpg (extracted face crops from each frame)
VGGSound folder structure

We use VGGSound dataset as noisy data which is mixed with the clean speech from LRS3 dataset. We download the audio files (*.wav files) from here.

data_root (vgg_sound)
β”œβ”€β”€ *.wav (audio files)

Train!

There are two major steps: (i) Train the student-lipsync model, (ii) Train the Denoising model.

Train the Student-Lipsync model

Navigate to the lipsync folder: cd lipsync

The lipsync model can be trained using:

python train_student.py --data_root_lrs3_pretrain= --data_root_lrs3_train= --noise_data_root= --wav2lip_checkpoint_path= --checkpoint_dir=

Note: The pre-trained Wav2Lip teacher model must be downloaded (wav2lip weights) before training the student model.

Train the Denoising model!

Navigate to the main directory: cd ..

The denoising model can be trained using:

python train.py --data_root_lrs3_pretrain= --data_root_lrs3_train= --noise_data_root= --lipsync_student_model_path= --checkpoint_dir=

The model can be resumed for training as well. Look at python train.py --help for more details. Also, additional less commonly-used hyper-parameters can be set at the bottom of the audio/hparams.py file.


Evaluation

To be updated soon!


Licence and Citation

The software is licensed under the MIT License. Please cite the following paper if you have used this code:

@InProceedings{Hegde_2021_WACV,
    author    = {Hegde, Sindhu B. and Prajwal, K.R. and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P. and Jawahar, C.V.},
    title     = {Visual Speech Enhancement Without a Real Visual Stream},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2021},
    pages     = {1926-1935}
}

Acknowledgements

Parts of the lipsync code has been modified using our Wav2Lip repository. The audio functions and parameters are taken from this TTS repository. We thank the authors for this wonderful code. The code for Face Detection has been taken from the face_alignment repository. We thank the authors for releasing their code and models.

Owner
Sindhu
Masters' by Research (MS) @ CVIT, IIIT Hyderabad
Sindhu
Semi-Supervised Learning with Ladder Networks in Keras. Get 98% test accuracy on MNIST with just 100 labeled examples !

Semi-Supervised Learning with Ladder Networks in Keras This is an implementation of Ladder Network in Keras. Ladder network is a model for semi-superv

Divam Gupta 101 Sep 07, 2022
Twin-deep neural network for semi-supervised learning of materials properties

Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction Citation: Semi-supervised teacher-student deep neural network for materials

MLEG 3 Dec 14, 2022
A mini-course offered to Undergrad chemistry students

The best way to use this material is by forking it by click the Fork button at the top, right corner. Then you will get your own copy to play with! Th

Raghu 19 Dec 19, 2022
Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions

Natural Posterior Network This repository provides the official implementation o

Oliver Borchert 54 Dec 06, 2022
Pixel-wise segmentation on VOC2012 dataset using pytorch.

PiWiSe Pixel-wise segmentation on the VOC2012 dataset using pytorch. FCN SegNet PSPNet UNet RefineNet For a more complete implementation of segmentati

Bodo Kaiser 378 Dec 30, 2022
City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces

City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces Paper Temporary GitHub page for City Surfaces paper. More soon! While designing s

14 Nov 10, 2022
The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

Deep Residual Fourier Transformation for Single Image Deblurring Xintian Mao, Yiming Liu, Wei Shen, Qingli Li and Yan Wang code will be released soon

145 Dec 13, 2022
Software & Hardware to do multi color printing with Sharpies

3D Print Colorizer is a combination of 3D printed parts and a Cura plugin which allows anyone with an Ender 3 like 3D printer to produce multi colored

343 Jan 06, 2023
Deep Reinforcement Learning based autonomous navigation for quadcopters using PPO algorithm.

PPO-based Autonomous Navigation for Quadcopters This repository contains an implementation of Proximal Policy Optimization (PPO) for autonomous naviga

Bilal Kabas 16 Nov 11, 2022
Codes for "Template-free Prompt Tuning for Few-shot NER".

EntLM The source codes for EntLM. Dependencies: Cuda 10.1, python 3.6.5 To install the required packages by following commands: $ pip3 install -r requ

77 Dec 27, 2022
Keras Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation by (Simon JΓ©gou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio)

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation: Work In Progress, Results can't be replicated yet with the m

Yad Konrad 196 Aug 30, 2022
Explainable Zero-Shot Topic Extraction

Zero-Shot Topic Extraction with Common-Sense Knowledge Graph This repository contains the code for reproducing the results reported in the paper "Expl

D2K Lab 56 Dec 14, 2022
Painting app using Python machine learning and vision technology.

AI Painting App We are making an app that will track our hand and helps us to draw from that. We will be using the advance knowledge of Machine Learni

Badsha Laskar 3 Oct 03, 2022
Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Polypharmacy - DDI - Synergy Survey The Survey Paper This repository accompanies our survey paper A Unified View of Relational Deep Learning for Polyp

AstraZeneca 79 Jan 05, 2023
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022
Apply AnimeGAN-v2 across frames of a video clip

title emoji colorFrom colorTo sdk app_file pinned AnimeGAN-v2 For Videos πŸ”₯ blue red gradio app.py false AnimeGAN-v2 For Videos Apply AnimeGAN-v2 acro

Nathan Raw 36 Oct 18, 2022
Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021)

Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021) Paper Video Instance Segmentation using Inter-Frame Communicat

Sukjun Hwang 81 Dec 29, 2022
Iterative Normalization: Beyond Standardization towards Efficient Whitening

IterNorm Code for reproducing the results in the following paper: Iterative Normalization: Beyond Standardization towards Efficient Whitening Lei Huan

Lei Huang 21 Dec 27, 2022
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022