Pytorch implementation of Tacotron

Last update: Dec 02, 2022

Overview

Tacotron-pytorch

A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Requirements

Install python 3
Install pytorch == 0.2.0
Install requirements:
```
pip install -r requirements.txt
```

Data

I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code.

File description

hyperparams.py includes all hyper parameters that are needed.
data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory.
module.py contains all methods, including CBHG, highway, prenet, and so on.
network.py contains networks including encoder, decoder and post-processing network.
train.py is for training.
synthesis.py is for generating TTS sample.

Training the network

STEP 1. Download and extract LJSpeech data at any directory you want.
STEP 2. Adjust hyperparameters in hyperparams.py, especially 'data_path' which is a directory that you extract files, and the others if necessary.
STEP 3. Run train.py.

Generate TTS wav file

STEP 1. Run synthesis.py. Make sure the restore step.

Samples

You can check the generated samples in 'samples/' directory. Training step was only 60K, so the performance is not good yet.

Reference

Keith ito: https://github.com/keithito/tacotron

Comments

Any comments for the codes are always welcome.

Pytorch implementation of Tacotron

Related tags

Overview

Tacotron-pytorch

Requirements

Data

File description

Training the network

Generate TTS wav file

Samples

Reference

Comments

Owner

soobin seo

AI-powered literature discovery and review engine for medical/scientific papers

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

NLP Text Classification

NLP codes implemented with Pytorch (w/o library such as huggingface)

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

A Facebook Messenger Chatbot using NLP

EdiTTS: Score-based Editing for Controllable Text-to-Speech

Convolutional 2D Knowledge Graph Embeddings resources

Python wrapper for Stanford CoreNLP tools v3.4.1

小布助手对话短文本语义匹配的一个baseline

Large-scale pretraining for dialogue

A Telegram bot to add notes to Flomo.

Clone a voice in 5 seconds to generate arbitrary speech in real-time