Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Last update: Sep 28, 2022

Overview

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo

Block diagram of FCL-taco2, where the decoder generates mel-spectrograms in AR mode within each phoneme and is shared for all phonemes.

💬 Huawei Noah's Ark Lab is recruiting interns on speech processing fields, if you're interested, you're welcome to contact Dr. Deng: [email protected]

Training and inference scripts for FCL-taco2

Environment

python 3.6.10
torch 1.3.1
chainer 6.0.0
espnet 8.0.0
apex 0.1
numpy 1.19.1
kaldiio 2.15.1
librosa 0.8.0

Training and inference:

Step1. Data preparation & preprocessing

Download LJSpeech
Unpack downloaded LJSpeech-1.1.tar.bz2 to /xx/LJSpeech-1.1
Obtain the forced alignment information by using Montreal forced aligner tool. Or you can download our alignment results, then unpack it to /xx/TextGrid
Preprocess the dataset to extract mel-spectrograms, phoneme duration, pitch, energy and phoneme sequence by:
```
 python preprocessing.py --data-root /xx/LJSpeech-1.1 --textgrid-root /xx/TextGrid
```

Step2. Model training

Training teacher model FCL-taco2-T:
```
 ./teacher_model_training.sh
```
Training student model FCL-taco2-S:
```
 ./student_model_training.sh
```
Parallel-WaveGAN vocoder training: follow instructions at here. You can also download the pre-trained PWG vocoder, and put the PWG model under the directory "vocoder".

Step3. Model evaluation

FCL-taco2-T evaluation:
```
 ./inference_teacher.sh
```
FCL-taco2-S evaluation:
```
 ./inference_student.sh
```

Citation

If the code is used in your research, please star our repo and cite our paper:

@inproceedings{wang2021fcl,
  title={Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis},
  author={Wang, Disong and Deng, Liqun and Zhang, Yang and Zheng, Nianzu and Yeung, Yu Ting and Chen, Xiao and Liu, Xunying and Meng, Helen},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5714--5718},
  year={2021},
  organization={IEEE}
}

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Related tags

Overview

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo

💬 Huawei Noah's Ark Lab is recruiting interns on speech processing fields, if you're interested, you're welcome to contact Dr. Deng: [email protected]

Training and inference scripts for FCL-taco2

Environment

Training and inference:

Citation

Owner

Disong Wang

Current state of supervised and unsupervised depth completion methods

BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins

Xview3 solution - XView3 challenge, 2nd place solution

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

DeepCAD: A Deep Generative Network for Computer-Aided Design Models

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks.

use machine learning to recognize gesture on raspberrypi

A tensorflow=1.13 implementation of Deconvolutional Networks on Graph Data (NeurIPS 2021)

[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

3D dataset of humans Manipulating Objects in-the-Wild (MOW)

Some experiments with tennis player aging curves using Hilbert space GPs in PyMC. Only experimental for now.

This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

Alignment Attention Fusion framework for Few-Shot Object Detection

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.