PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Last update: Nov 14, 2022

Overview

VAENAR-TTS - PyTorch Implementation

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Quickstart

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Inference

You have to download the pretrained models and put them in output/ckpt/LJSpeech/.

For English single-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

The generated utterances will be put in output/result/.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/LJSpeech/val.txt --restore_step RESTORE_STEP --mode batch -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

to synthesize all utterances in preprocessed_data/LJSpeech/val.txt

Training

Datasets

The supported datasets are

LJSpeech: a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.

Preprocessing

First, run

python3 prepare_align.py config/LJSpeech/preprocess.yaml

for some preparations. And then run the preprocessing script.

python3 preprocess.py config/LJSpeech/preprocess.yaml

Training

Train your model with

python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

TensorBoard

Use

tensorboard --logdir output/log/LJSpeech

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.

Implementation Issues

Removed arguments, methods during converting Tensorflow to PyTorch: name, kwargs, training, get_config()
Specify in_features in LinearNorm which is corresponding to tf.keras.layers.Dense. Also, in_channels is explicitly specified in Conv1D.
get_mask_from_lengths() function returns logical not of that of FastSpeech2.
In this implementation, the griffin_lim algorithms is used to convert a mel-spectrogram to a waveform. You can use HiFi-GAN as a vocoder by setting config, but you need to train it from scratch (you cannot use the provided pre-trained HiFi-GAN model).

Citation

@misc{lee2021vaenar-tts,
  author = {Lee, Keon},
  title = {VAENAR-TTS},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/VAENAR-TTS}}
}

References

thuhcsi's VAENAR-TTS

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

40 Aug 14, 2022

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

261 Nov 12, 2022

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

152 Jan 2, 2023

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

1.8k Jan 8, 2023

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

66 Nov 28, 2022

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

GLAT Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation" Requirements Python = 3.7 Pytorch

117 Jan 9, 2023

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN (CARGAN) Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [compan

150 Dec 6, 2022

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

59 Dec 6, 2022

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling Reference Main paper to be cited (Di Wu et al., 2020) @article

34 Nov 3, 2022

Comments

inference results

Hi! Thank you for the port. Have you been able to get results on inference stage? I successfully train model, validation losses are decreasing, but at inference there's garbage. I started logging log-probabilities of posterior and prior networks and see that they're also going down throughout training. Logpgobs in around -70000 for both networks which is very very small number, say zero in probability space. Also if remove clipping kl divergence torch.max(kl_divergence, torch.tensor(0., device=device)) something bad happens and kl goes negative, which is not possible in math point of view, but can be if our values are not valid distributions. Then I set n_samples to 4 and reduce batch_size to 8 but still get negative values. Pytorch's implementation of KLDivloss with log_targets=True always give 0 loss values.... so.. have you had any success?

opened by thepowerfuldeez 22
For model/prior.py _initial_sample, why the prob is calculated as from N(0,1)?

Hello, thanks for sharing the pytorch-based code! However, I have some question about the _initial_sample func in model/prior.py. epsilon is sampled from N(0, t) (t is the temperature), how its logprob is calculated? For norm distribution, After log (the mean is 0) . Can you explain why use \sigma as 1 instead of t here?

opened by seekerzz 16

Releases(v1.0.0)

v1.0.0(Aug 3, 2021)

First Official Release
Source code(tar.gz)
Source code(zip)

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Related tags

Overview

VAENAR-TTS - PyTorch Implementation

Quickstart

Dependencies

Inference

Batch Inference

Training

Datasets

Preprocessing

Training

TensorBoard

Implementation Issues

Citation

References

You might also like...

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

Comments

inference results

For model/prior.py _initial_sample, why the prob is calculated as from N(0,1)?

Releases(v1.0.0)

v1.0.0(Aug 3, 2021)

Owner

Keon Lee

Face detection using deep learning.

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Scalable machine learning based time series forecasting

(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

OpenMMLab Computer Vision Foundation

Weakly Supervised Learning of Rigid 3D Scene Flow

Code for our paper: Online Variational Filtering and Parameter Learning

This repository contains the reference implementation for our proposed Convolutional CRFs.

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

Justmagic - Use a function as a method with this mystic script, like in Nim

Learning-Augmented Dynamic Power Management

Computer Vision and Pattern Recognition, NUS CS4243, 2022

High performance distributed framework for training deep learning recommendation models based on PyTorch.

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Bu repo SAHI uygulamasını mantığını öğreniyoruz.

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Scheduling BilinearRewards

[ICCV 2021 Oral] Deep Evidential Action Recognition