AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Last update: Dec 28, 2022

Related tags

Deep Learning AdaSpeech2

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Unofficial Pytorch implementation of AdaSpeech 2.

Requirements :

All code written in Python 3.6.2 .

Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

Installing other requirements :

pip install -r requirements.txt

To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

Training :

[WIP]

Citations :

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

@misc{yan2021adaspeech,
      title={AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data}, 
      author={Yuzi Yan and Xu Tan and Bohan Li and Tao Qin and Sheng Zhao and Yuan Shen and Tie-Yan Liu},
      year={2021},
      eprint={2104.09715},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Related tags

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Requirements :

For Preprocessing :

Training :

Citations :

Owner

Rishikesh (ऋषिकेश)

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

Official Pytorch implementation of 'RoI Tanh-polar Transformer Network for Face Parsing in the Wild.'

Python-experiments - A Repository which contains python scripts to automate things and make your life easier with python

A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook format ready to run in Google Colaboratory

African language Speech Recognition - Speech-to-Text

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

Simulation-based inference for the Galactic Center Excess

VM3000 Microphones

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval - ICCV2021

Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

Generate vibrant and detailed images using only text.

NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'