GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

This is an unofficial implementation of GradTTS. We created this project based on GlowTTS (https://github.com/jaywalnut310/glow-tts). We replace the GlowDecoder with DiffusionDecoder which follows the settings of the original paper. In addition, we also replace torch.distributed with horovod for convenience and we don't use fp16 now.

Updates

2021/07/28: LJSpeech Samples uploaded which has the same performance as the original paper's demo.

Training and inference

Please go to egs/ folder, and see run.sh and inference_waveglow_vocoder.py for example use. Before training, please download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY. And build Monotonic Alignment Search Code (Cython): cd monotonic_align; python setup.py build_ext --inplace. Before inference, you should download waveglow checkpoint from download_link and put it into the waveglow folder.

Reference Materials

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

GlowTTS

Score-Based Generative Modeling through Stochastic Differential Equations

score_sde_pytorch

denoising-diffusion-pytorch

Authors

Heyang Xue(https://github.com/WelkinYang) and Qicong Xie(https://github.com/QicongXie)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
egs/gradtts_n_1000		egs/gradtts_n_1000
filelists		filelists
monotonic_align		monotonic_align
text		text
waveglow		waveglow
LICENSE		LICENSE
README.md		README.md
attentions.py		attentions.py
audio_processing.py		audio_processing.py
commons.py		commons.py
data_utils.py		data_utils.py
models.py		models.py
modules.py		modules.py
stft.py		stft.py
train.py		train.py
unet.py		unet.py
utils.py		utils.py

License

WelkinYang/GradTTS

Folders and files

Latest commit

History

Repository files navigation

GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

Updates

Training and inference

Reference Materials

Authors

About

Resources

License

Stars

Watchers

Forks

Languages