ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

EfficientNetv2 TensorRT int8

A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

Real-Time Semantic Segmentation in Mobile device

Photographic Image Synthesis with Cascaded Refinement Networks - Pytorch Implementation

patchmatch和patchmatchstereo算法的python实现

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Code release for Local Light Field Fusion at SIGGRAPH 2019

WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction"

《Truly shift-invariant convolutional neural networks》(2021)

Jax/Flax implementation of Variational-DiffWave.

A simple Python library for stochastic graphical ecological models

Transformer Tracking (CVPR2021)

Materials for upcoming beginner-friendly PyTorch course (work in progress).

pytorch bert intent classification and slot filling

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

A rule-based log analyzer & filter

Gym-TORCS is the reinforcement learning (RL) environment in TORCS domain with OpenAI-gym-like interface.

An interactive DNN Model deployed on web that predicts the chance of heart failure for a patient with an accuracy of 98%

Improving Factual Consistency of Abstractive Text Summarization

Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image