Fast and Simple Neural Vocoder, the Multiband RNNMS

Last update: Jan 11, 2022

Related tags

Deep Learning MultibandRNNMS

Overview

Multiband RNN_MS

Fast and Simple vocoder, Multiband RNN_MS.

Demo
Quick training
How to Use
System Details
Results
References

Demo

ToDO: Link super great impressive high-quatity audio demo.

Quick Training

Jump to ☞ , then Run. That's all!

How to Use

1. Install

# pip install "torch==1.10.0" -q      # Based on your environment (validated with v1.10)
# pip install "torchaudio==0.10.0" -q # Based on your environment
pip install git+https://github.com/tarepan/MultibandRNNMS

2. Data & Preprocessing

"Batteries Included".
RNNMS transparently download corpus and preprocess it for you 😉

3. Train

python -m mbrnnms.main_train

For arguments, check ./mbrnnms/config.py

Advanced: Other datasets

You can switch dataset with arguments.
All speechcorpusy's preset corpuses are supported.

# LJSpeech corpus
python -m mbrnnms.main_train data.data_name=LJ

Advanced: Custom dataset

Copy mbrnnms.main_train and replace DataModule.

    # datamodule = LJSpeechDataModule(batch_size, ...)
    datamodule = YourSuperCoolDataModule(batch_size, ...)
    # That's all!

System Details

Model

PreNet: GRU
Upsampler: time-directional nearest interpolation
Decoder: Embedding-auto-regressive generative RNN with 10-bit μ-law encoding

Results

Output Sample

Demo

Performance

X [iter/sec] @ NVIDIA T4 on Google Colaboratory (AMP+, num_workers=8)

It takes about Ydays for full training.

References

Acknowlegements

: Basic vocoder concept came from this paper.
bshall/UniversalVocoding: Model and hyperparams are derived from this repository. All codes are re-written.

Fast and Simple Neural Vocoder, the Multiband RNNMS

Related tags

Overview

Multiband RNN_MS

Demo

Quick Training

How to Use

1. Install

2. Data & Preprocessing

3. Train

Advanced: Other datasets

Advanced: Custom dataset

System Details

Model

Results

Output Sample

Performance

References

Acknowlegements

Owner

tarepan

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Code and data accompanying our SVRHM'21 paper.

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

UCSD Oasis platform

Qcover is an open source effort to help exploring combinatorial optimization problems in Noisy Intermediate-scale Quantum(NISQ) processor.

Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I

Pure python implementation reverse-mode automatic differentiation

Code accompanying the paper "ProxyFL: Decentralized Federated Learning through Proxy Model Sharing"

1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Learning Temporal Consistency for Low Light Video Enhancement from Single Images (CVPR2021)

Classifying cat and dog images using Kaggle dataset

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Chunkmogrify: Real image inversion via Segments

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020.

Research on Event Accumulator Settings for Event-Based SLAM

T2F: text to face generation using Deep Learning

Code for "The Box Size Confidence Bias Harms Your Object Detector"