Unofficial PyTorch Implementation of Multi-Singer

Last update: Dec 28, 2022

Related tags

Deep Learning Multi-Singer

Overview

Multi-Singer

Unofficial PyTorch Implementation of Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.

Requirements

See requirements in requirement.txt:

linux
python 3.6
pytorch 1.0+
librosa
json, tqdm, logging

TODO

1026: upload code
1024: implement multi-singer & perceptual loss
1023: implement singer encoder

Getting started

Apply recipe to your own dataset

Put any wav files in data directory
Edit configuration in config/config.yaml

1. Pretrain

Pretrain the Singer Embedding Extractor using repository here, and set the 'enc_model_fpath' in config/config.yaml

Note: Please set params as those in 'encoder/params_data' and 'encoder/params_model'.

2. Preprocess

Extract mel-spectrogram

python preprocess.py -i data/wavs -o data/feature -c config/config.yaml

-i your audio folder

-o output acoustic feature folder

-c config file

3. Train

Training conditioned on mel-spectrogram

python train.py -i data/feature -o checkpoints/ --config config/config.yaml

-i acoustic feature folder

-o directory to save checkpoints

-c config file

4. Inference

python inference.py -i data/feature -o outputs/  -c checkpoints/*.pkl -g config/config.yaml

-i acoustic feature folder

-o directory to save generated speech

-c checkpoints file

-c config file

5. Singing Voice Synthesis

For Singing Voice Synthesis:

Take modified FastSpeech for mel-spectrogram synthesis
Use synthesized mel-spectrogram in Multi-Singer for waveform synthesis.

Acknowledgements

Citation

Please cite this repository by the "Cite this repository" of About section (top right of the main page).

Question

Feel free to contact me at [email protected]

Unofficial PyTorch Implementation of Multi-Singer

Related tags

Overview

Multi-Singer

Requirements

TODO

Getting started

Apply recipe to your own dataset

1. Pretrain

Note: Please set params as those in 'encoder/params_data' and 'encoder/params_model'.

2. Preprocess

3. Train

4. Inference

5. Singing Voice Synthesis

Acknowledgements

Citation

Question

Owner

SunMail-hub

Multi-Scale Geometric Consistency Guided Multi-View Stereo

General Multi-label Image Classification with Transformers

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Code for KHGT model, AAAI2021

Dirty Pixels: Towards End-to-End Image Processing and Perception

Simple machine learning library / 簡單易用的機器學習套件

Human Dynamics from Monocular Video with Dynamic Camera Movements

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Dynamic Environments with Deformable Objects (DEDO)

Emotion Recognition from Facial Images

IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Image Deblurring using Generative Adversarial Networks

Frigate - NVR With Realtime Object Detection for IP Cameras

The software associated with a paper accepted at EMNLP 2021 titled "Open Knowledge Graphs Canonicalization using Variational Autoencoders".

Code for "Layered Neural Rendering for Retiming People in Video."

PyTorch reimplementation of Diffusion Models