PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

Use fastai-v2 with HuggingFace's pretrained transformers

scikit-learn wrappers for Python fastText.

Dope Wars game engine on StarkNet L2 roll-up

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

An open collection of annotated voices in Japanese language

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

The entmax mapping and its loss, a family of sparse softmax alternatives.

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

End-2-end speech synthesis with recurrent neural networks

A library for Multilingual Unsupervised or Supervised word Embeddings

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Journey is a NLP-Powered Developer assistant

Google and Stanford University released a new pre-trained model called ELECTRA

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

Pipelines de datos, 2021.

LUKE -- Language Understanding with Knowledge-based Embeddings

Topic Inference with Zeroshot models

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles