AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Last update: Dec 28, 2022

Related tags

Deep Learning AdaSpeech2

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Unofficial Pytorch implementation of AdaSpeech 2.

Requirements :

All code written in Python 3.6.2 .

Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

Installing other requirements :

pip install -r requirements.txt

To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

Training :

[WIP]

Citations :

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

@misc{yan2021adaspeech,
      title={AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data}, 
      author={Yuzi Yan and Xu Tan and Bohan Li and Tao Qin and Sheng Zhao and Yuan Shen and Tie-Yan Liu},
      year={2021},
      eprint={2104.09715},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Related tags

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Requirements :

For Preprocessing :

Training :

Citations :

Owner

Rishikesh (ऋषिकेश)

Best Practices on Recommendation Systems

Jiminy Cricket Environment (NeurIPS 2021)

Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation（Unfinished）

OptNet: Differentiable Optimization as a Layer in Neural Networks

Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

Neural Caption Generator with Attention

Space-event-trace - Tracing service for spaceteam events

SiT: Self-supervised vIsion Transformer

A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

Dogs classification with Deep Metric Learning using some popular losses

Rename Images with Auto Generated Neural Image Captions

🧮 Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

TorchGRL is the source code for our paper Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments for IV 2022.

Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

Code for "Layered Neural Rendering for Retiming People in Video."

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Methods to get the probability of a changepoint in a time series.