The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Last update: Oct 28, 2022

Related tags

Overview

VAENAR-TTS

This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

Samples | Paper | Pretrained Models

Usage

0. Dataset

English: LJSpeech
Mandarin: DataBaker(标贝)

1. Environment setup

conda env create -f environment.yml
conda activate vaenartts-env

2. Data pre-processing

For English using LJSpeech:

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker

3. Training

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir

4. Inference (synthesize speech for the whole test set)

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Related tags

Overview

VAENAR-TTS

Samples | Paper | Pretrained Models

Usage

0. Dataset

1. Environment setup

2. Data pre-processing

3. Training

4. Inference (synthesize speech for the whole test set)

Reference

Owner

THUHCSI

Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

SBINN: Systems-biology informed neural network

MonoScene: Monocular 3D Semantic Scene Completion

A Bayesian cognition approach for belief updating of correlation judgement through uncertainty visualizations

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Bib-parser - Convenient script to parse .bib files with the ACM Digital Library like metadata

The project of phase's key role in complex and real NN

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

A generalist algorithm for cell and nucleus segmentation.

ML models implementation practice

Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

An image classification app boilerplate to serve your deep learning models asap!

Code for CVPR 2018 paper --- Texture Mapping for 3D Reconstruction with RGB-D Sensor

clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences"

Moon-patrol - A faithful recreation of the 1983 hit classic Moon Patrol for the Atari 2600 created using the Pygame library for Python

Implementation of "Fast and Flexible Temporal Point Processes with Triangular Maps" (Oral @ NeurIPS 2020)