Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Overview

MLP Singer

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Abstract

Recent developments in deep learning have significantly improved the quality of synthesized singing voice audio. However, prominent neural singing voice synthesis systems suffer from slow inference speed due to their autoregressive design. Inspired by MLP-Mixer, a novel architecture introduced in the vision literature for attention-free image classification, we propose MLP Singer, a parallel Korean singing voice synthesis system. To the best of our knowledge, this is the first work that uses an entirely MLP-based architecture for voice synthesis. Listening tests demonstrate that MLP Singer outperforms a larger autoregressive GAN-based system, both in terms of audio quality and synthesis speed. In particular, MLP Singer achieves a real-time factor of up to 200 and 3400 on CPUs and GPUs respectively, enabling order of magnitude faster generation on both environments.

Citation

Please cite this work as follows.

@misc{tae2021mlp,
      title={MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis}, 
      author={Jaesung Tae and Hyeongju Kim and Younggun Lee},
      year={2021},
}

Quickstart

  1. Clone the repository including the git submodule.

    git clone --recurse-submodules https://github.com/neosapience/mlp-singer.git
  2. Install package requirements.

cd mlp-singer
pip install -r requirements.txt
  1. To generate audio files with the trained model checkpoint, download the HiFi-GAN checkpoint along with its configuration file and place them in hifi-gan.

  2. Run inference using the following command. Generated audio samples are saved in the samples directory by default.

    python inference.py --checkpoint_path checkpoints/default/model.pt

Dataset

We used the Children Song Dataset, an open-source singing voice dataset comprised of 100 annotated Korean and English children songs sung by a single professional singer. We used only the Korean subset of the dataset to train the model.

You can train the model on any custom dataset of your choice, as long as it includes lyrics text, midi transcriptions, and monophonic a capella audio file triplets. These files should be titled identically, and should also be placed in specific directory locations as shown below.

├── data
│   └── raw
│       ├── mid
│       ├── txt
│       └── wav

The directory names correspond to file extensions. We have included a sample as reference.

Preprocessing

Once you have prepared the dataset, run

python -m data.serialize

from the root directory. This will create data/bin that contains binary files used for training. This repository already contains example binary files created from the sample in data/raw.

Training

To train the model, run

python train.py

This will read the default configuration file located in configs/model.json to initialize the model. Alternatively, you can also create a new configuration and train the model via

python train.py --config_path PATH/TO/CONFIG.json

Running this command will create a folder under the checkpoints directory according to the name field specified in the configuration file.

You can also continue training from a checkpoint. For example, to resume training from the provided pretrained model checkpoint, run

python train.py --checkpoint_path /checkpoints/default/model.pt

Unless a --config_path flag is explicitly provided, the script will read config.json in the checkpoint directory. In both cases, model checkpoints will be saved regularly according to the interval defined in the configuration file.

Inference

MLP Singer produces mel-spectrograms, which are then fed into a neural vocoder to generate raw waveforms. This repository uses HiFi-GAN as the vocoder backend, but you can also plug other vocoders like WaveGlow. To generate samples, run

python inference.py --checkpoint_path PATH/TO/CHECKPOINT.pt --song little_star

This will create .wav samples in the samples directory, and save mel-spectrogram files as .npy files in hifi-gan/test_mel_dirs.

You can also specify any song you want to perform inference on, as long as the song is present in data/raw. The argument to the --song flag should match the title of the song as it is saved in data/raw.

Note

For demo and internal experiments, we used a variant of HiFi-GAN that used different mel-spectrogram configurations. As such, the provided checkpoint for MLP Singer is different from the one referred to in the paper. Moreover, the vocoder used in the demo was further fine-tuned on the Children's Song Dataset.

Acknowledgements

This implementation was inspired by the following repositories.

License

Released under the MIT License.

Owner
Neosapience
Neosapience, an artificial being enabled by artificial intelligence, will soon be everywhere in our daily lives.
Neosapience
This is a project of data parallel that running on NLP tasks.

This is a project of data parallel that running on NLP tasks.

2 Dec 12, 2021
Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

Wordle_Bot Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time. It will log onto the wordle website and en

Lucas Polidori 15 Dec 11, 2022
SGMC: Spectral Graph Matrix Completion

SGMC: Spectral Graph Matrix Completion Code for AAAI21 paper "Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning". Data Format

Chao Chen 8 Dec 12, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
Prompt tuning toolkit for GPT-2 and GPT-Neo

mkultra mkultra is a prompt tuning toolkit for GPT-2 and GPT-Neo. Prompt tuning injects a string of 20-100 special tokens into the context in order to

61 Jan 01, 2023
Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

Jeffrey M. Binder 20 Jan 09, 2023
Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

GDAP The code of paper "Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"" Event Datasets Prep

45 Oct 29, 2022
Refactored version of FastSpeech2

Refactored version of FastSpeech2. An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

ILJI CHOI 10 May 26, 2022
Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

Ilias Antonopoulos 3 Jul 24, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
ACL'22: Structured Pruning Learns Compact and Accurate Models

☕ CoFiPruning: Structured Pruning Learns Compact and Accurate Models This repository contains the code and pruned models for our ACL'22 paper Structur

Princeton Natural Language Processing 130 Jan 04, 2023
An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

WordleSolver An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode. How to use the program Copy this proje

Akil Selvan Rajendra Janarthanan 3 Mar 02, 2022
New Modeling The Background CodeBase

Modeling the Background for Incremental Learning in Semantic Segmentation This is the updated official PyTorch implementation of our work: "Modeling t

Fabio Cermelli 9 Dec 28, 2022
Simple text to phones converter for multiple languages

Phonemizer -- foʊnmaɪzɚ The phonemizer allows simple phonemization of words and texts in many languages. Provides both the phonemize command-line tool

CoML 762 Dec 29, 2022
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Mutian He 19 Oct 14, 2022
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 09, 2022
Text Analysis & Topic Extraction on Android App user reviews

AndroidApp_TextAnalysis Hi, there! This is code archive for Text Analysis and Topic Extraction from user_reviews of Android App. Dataset Source : http

Fitrie Ratnasari 1 Feb 14, 2022
PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 03, 2023
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022