Fast and Simple vocoder, Multiband RNN_MS.
ToDO: Link super great impressive high-quatity audio demo.
Jump to ☞ , then Run. That's all!
# pip install "torch==1.10.0" -q # Based on your environment (validated with v1.10)
# pip install "torchaudio==0.10.0" -q # Based on your environment
pip install git+https://github.com/tarepan/MultibandRNNMS
"Batteries Included".
RNNMS transparently download corpus and preprocess it for you 😉
python -m mbrnnms.main_train
For arguments, check ./mbrnnms/config.py
You can switch dataset with arguments.
All speechcorpusy
's preset corpuses are supported.
# LJSpeech corpus
python -m mbrnnms.main_train data.data_name=LJ
Copy mbrnnms.main_train
and replace DataModule.
# datamodule = LJSpeechDataModule(batch_size, ...)
datamodule = YourSuperCoolDataModule(batch_size, ...)
# That's all!
- PreNet: GRU
- Upsampler: time-directional nearest interpolation
- Decoder: Embedding-auto-regressive generative RNN with 10-bit μ-law encoding
X [iter/sec] @ NVIDIA T4 on Google Colaboratory (AMP+, num_workers=8)
It takes about Ydays for full training.
- : Basic vocoder concept came from this paper.
- bshall/UniversalVocoding: Model and hyperparams are derived from this repository. All codes are re-written.