This repository contains the PyTorch implementation of the following paper:
Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim, Joanna Hong, and Yong Man Ro
[Paper] [Demo Video]
- python 3.7
- pytorch 1.6 ~ 1.8
- torchvision
- torchaudio
- ffmpeg
- av
- tensorboard
- scikit-image 0.17.0 ~
- opencv-python 3.4 ~
- pillow
- librosa
- pystoi
- pesq
- scipy
Please refer here to run the code on GRID dataset.
Please refer here to run the code and model on LRS2 and LRS3 datasets.
If you find this work useful in your research, please cite the papers:
@article{kim2021vcagan,
title={Lip to Speech Synthesis with Visual Context Attentional GAN},
author={Kim, Minsu and Hong, Joanna and Ro, Yong Man},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}
@inproceedings{kim2023lip,
title={Lip-to-speech synthesis in the wild with multi-task learning},
author={Kim, Minsu and Hong, Joanna and Ro, Yong Man},
booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}