Neural HMMs are all you need (for high-quality attention-free TTS)

Last update: Oct 28, 2022

Overview

Neural HMMs are all you need (for high-quality attention-free TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter

This is the official code repository for the paper "Neural HMMs are all you need (for high-quality attention-free TTS)". For audio examples, visit our demo page. A pre-trained model is also available.

Setup and training using LJ Speech

Download and extract the LJ Speech dataset. Place it in the data folder such that the directory becomes data/LJSpeech-1.1. Otherwise update the filelists in data/filelists accordingly.
Clone this repository git clone https://github.com/shivammehta007/Neural-HMM.git
- If using single GPU checkout the branch gradient_checkpointing it will help to fit bigger batch size during training.
Initalise the submodules git submodule init; git submodule update
Make sure you have docker installed and running.
- It is recommended to use Docker (it manages the CUDA runtime libraries and Python dependencies itself specified in Dockerfile)
- Alternatively, If you do not intend to use Docker, you can use pip to install the dependencies using pip install -r requirements.txt
Run bash start.sh and it will install all the dependencies and run the container.
Check src/hparams.py for hyperparameters and set GPUs.
1. For multi-GPU training, set GPUs to [0, 1 ..]
2. For CPU training (not recommended), set GPUs to an empty list []
3. Check the location of transcriptions
Run python train.py to train the model.
1. Checkpoints will be saved in the hparams.checkpoint_dir.
2. Tensorboard logs will be saved in the hparams.tensorboard_log_dir.
To resume training, run python train.py -c <CHECKPOINT_PATH>

Synthesis

Download our pre-trained LJ Speech model. (This is the exact same model as system NH2 in the paper, but with training continued until reaching 200k updates total.)
Download Nvidia's WaveGlow model.
Run jupyter notebook and open synthesis.ipynb.

Miscellaneous

Mixed-precision training or full-precision training

In src.hparams.py change hparams.precision to 16 for mixed precision and 32 for full precision.

Multi-GPU training or single-GPU training

Since the code uses PyTorch Lightning, providing more than one element in the list of GPUs will enable multi-GPU training. So change hparams.gpus to [0, 1, 2] for multi-GPU training and single element [0] for single-GPU training.

Known issues/warnings

PyTorch dataloader

If you encounter this error message [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool), this is a known issue in PyTorch Dataloader.
It will be fixed when PyTorch releases a new Docker container image with updated version of Torch. If you are not using docker this can be removed with torch > 1.9.1

Support

If you have any questions or comments, please open an issue on our GitHub repository.

Citation information

If you use or build on our method or code for your research, please cite our paper:

@article{mehta2021neural,
  title={Neural {HMM}s are all you need (for high-quality attention-free {TTS})},
  author={Mehta, Shivam and Sz{\'e}kely, {\'E}va and Beskow, Jonas and Henter, Gustav Eje},
  journal={arXiv preprint arXiv:2108.13320},
  year={2021}
}

Acknowledgements

The code implementation is based on Nvidia's implementation of Tacotron 2 and uses PyTorch Lightning for boilerplate-free code.

🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Microsoft Edge TTS for Home Assistant This component is based on the TTS service of Microsoft Edge browser, no need to apply for app_key. Install Down

152 Dec 31, 2022

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

180 Jan 5, 2023

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Polarized Self-Attention: Towards High-quality Pixel-wise Regression This is an official implementation of: Huajun Liu, Fuqiang Liu, Xinyi Fan and Don

212 Jan 8, 2023

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation E2EC: An End-to-End Contour-based Method for High-Quality H

146 Dec 29, 2022

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

157 Jan 1, 2023

Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

247 Dec 28, 2022

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

59 Sep 25, 2022

Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

1k Jan 8, 2023

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

194 Jan 3, 2023

Neural HMMs are all you need (for high-quality attention-free TTS)

Related tags

Overview

Neural HMMs are all you need (for high-quality attention-free TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter

Setup and training using LJ Speech

Synthesis

Miscellaneous

Mixed-precision training or full-precision training

Multi-GPU training or single-GPU training

Known issues/warnings

PyTorch dataloader

Support

Citation information

Acknowledgements

You might also like...

🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Code for "Diffusion is All You Need for Learning on Surfaces"

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Releases(Neural-HMM)

Owner

Shivam Mehta

A unified framework to jointly model images, text, and human attention traces.

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Open source repository for the code accompanying the paper 'PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations'.

Addon and nodes for working with structural biology and molecular data in Blender.

Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

Neural Nano-Optics for High-quality Thin Lens Imaging

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

Problem-943.-ACMP - Problem 943. ACMP

Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task

A SAT-based sudoku solver

Download files from DSpace systems (because for some reason DSpace won't let you)

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLPv2, RaftMLP, ConvMLP, ConvMixer in Jittor and PyTorch.

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

unofficial pytorch implementation of RefineGAN

Winners of the Facebook Image Similarity Challenge

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

Official Implementation of SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations