ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

SysWhispers Shellcode Loader

Code for "Diffusion is All You Need for Learning on Surfaces"

SOTR: Segmenting Objects with Transformers [ICCV 2021]

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

One Million Scenes for Autonomous Driving

Instance-Dependent Partial Label Learning

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

Which Style Makes Me Attractive? Interpretable Control Discovery and Counterfactual Explanation on StyleGAN

Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"

A Python library for generating new text from existing samples.

Benchmarks for the Optimal Power Flow Problem

Simple codebase for flexible neural net training

Code for "Long Range Probabilistic Forecasting in Time-Series using High Order Statistics"

Joint-task Self-supervised Learning for Temporal Correspondence (NeurIPS 2019)

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Python with OpenCV - MediaPip Framework Hand Detection

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Fake videos detection by tracing the source using video hashing retrieval.