Skip to content

ms-dot-k/Visual-Context-Attentional-GAN

Repository files navigation

Lip to Speech Synthesis with Visual Context Attentional GAN

This repository contains the PyTorch implementation of the following paper:

Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim, Joanna Hong, and Yong Man Ro
[Paper] [Demo Video]

Requirements

  • python 3.7
  • pytorch 1.6 ~ 1.8
  • torchvision
  • torchaudio
  • ffmpeg
  • av
  • tensorboard
  • scikit-image 0.17.0 ~
  • opencv-python 3.4 ~
  • pillow
  • librosa
  • pystoi
  • pesq
  • scipy

GRID

Please refer here to run the code on GRID dataset.

LRS2/LRS3

Please refer here to run the code and model on LRS2 and LRS3 datasets.

Citation

If you find this work useful in your research, please cite the papers:

@article{kim2021vcagan,
  title={Lip to Speech Synthesis with Visual Context Attentional GAN},
  author={Kim, Minsu and Hong, Joanna and Ro, Yong Man},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

@inproceedings{kim2023lip,
  title={Lip-to-speech synthesis in the wild with multi-task learning},
  author={Kim, Minsu and Hong, Joanna and Ro, Yong Man},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

About

PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages