Lip to Speech Synthesis with Visual Context Attentional GAN

This repository contains the PyTorch implementation of the following paper:

Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim, Joanna Hong, and Yong Man Ro
[Paper] [Demo Video]

Requirements

python 3.7
pytorch 1.6 ~ 1.8
torchvision
torchaudio
ffmpeg
av
tensorboard
scikit-image 0.17.0 ~
opencv-python 3.4 ~
pillow
librosa
pystoi
pesq
scipy

GRID

Please refer here to run the code on GRID dataset.

LRS2/LRS3

Please refer here to run the code and model on LRS2 and LRS3 datasets.

Citation

If you find this work useful in your research, please cite the papers:

@article{kim2021vcagan,
  title={Lip to Speech Synthesis with Visual Context Attentional GAN},
  author={Kim, Minsu and Hong, Joanna and Ro, Yong Man},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

@inproceedings{kim2023lip,
  title={Lip-to-speech synthesis in the wild with multi-task learning},
  author={Kim, Minsu and Hong, Joanna and Ro, Yong Man},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
ASR_model		ASR_model
data		data
img		img
preprocess		preprocess
src		src
LICENSE		LICENSE
README.md		README.md
README_GRID.md		README_GRID.md
README_LRS.md		README_LRS.md
test.py		test.py
test_LRS.py		test_LRS.py
train.py		train.py
train_LRS.py		train_LRS.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR_model

ASR_model

data

data

img

img

preprocess

preprocess

src

src

LICENSE

LICENSE

README.md

README.md

README_GRID.md

README_GRID.md

README_LRS.md

README_LRS.md

test.py

test.py

test_LRS.py

test_LRS.py

train.py

train.py

train_LRS.py

train_LRS.py

Repository files navigation

Lip to Speech Synthesis with Visual Context Attentional GAN

Requirements

GRID

LRS2/LRS3

Citation

About

Releases

Packages

Languages

License

ms-dot-k/Visual-Context-Attentional-GAN

Folders and files

Latest commit

History

Repository files navigation

Lip to Speech Synthesis with Visual Context Attentional GAN

Requirements

GRID

LRS2/LRS3

Citation

About

Resources

License

Stars

Watchers

Forks

Languages