This code is an unofficial implementation of HiFiSinger.

Last update: Dec 23, 2022

Related tags

Overview

HiFiSinger

This code is an unofficial implementation of HiFiSinger. The algorithm is based on the following papers:

Chen, J., Tan, X., Luan, J., Qin, T., & Liu, T. Y. (2020). HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776.
Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2019). Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems, 32, 3171-3180.
Yamamoto, R., Song, E., & Kim, J. M. (2020, May). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6199-6203). IEEE.

Requirements

Please see the 'requirements.txt'.

Structure

Generator

In training, length regulator use target duration.

Discriminator

HiFiSinger uses Sub Frequency GAN(SF-GAN).
The frequency range of sampling is fixed and length range is randomized.

Used dataset

Code verification was conducted through a limited-sized, private Korean dataset.
- Thus, current Pattern_Generator.py and Datasets.py are based on the Korean.
Please report the information about any available open source dataset.
- The set of midi files with syncronized lyric and high resolution vocal wave files

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in 'Hyper_Parameters.yaml' according to your environment.

Sound
- Setting basic sound parameters.
Tokens
- The number of Lyric token.
Max_Note
- The highest note value for embedding.
Min/Max duration
- Mel length which model use.
- Min duration is used at pattern generating only.
Encoder
- Setting the encoder.
Duration_Predictor
- Setting for duration predictor
Decoder
- Setting for decoder.
Discriminator
- Setting for discriminator
- In frequency range, frequency is the index of mel dimension.
  - The index must be equal or less than Sould.Mel_Dim.
Vocoder_Path
- Setting the traced vocoder path.
- To generate this, please check Here
Train
- Setting the parameters of training.
Use_Mixed_Precision
- Setting mix precision usage.
- Need a Nvidia-Apex.
Inference_Batch_Size
- Setting the batch size when inference
Inference_Path
- Setting the inference path
Checkpoint_Path
- Setting the checkpoint path
Log_Path
- Setting the tensorboard log path
Device
- Setting which GPU device is used in multi-GPU enviornment.
- Or, if using only CPU, please set '-1'. (But, I don't recommend while training.)

Generate pattern

There is no available open source dataset.

Inference file path while training for verification.

Inference_for_Training
- There are two examples for inference.
- It is midi file based script.

Run

Command

python Train.py -s

-hp
- The hyper paramter file path
- This is required.
-s
- The resume step parameter.
- Default is 0.

This code is an unofficial implementation of HiFiSinger.

Related tags

Overview

HiFiSinger

Requirements

Structure

Generator

Discriminator

Used dataset

Hyper parameters

Generate pattern

Inference file path while training for verification.

Run

Command

Owner

Heejo You

HarDNeXt: Official HarDNeXt repository

Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch

Serverless proxy for Spark cluster

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

VR-Caps: A Virtual Environment for Active Capsule Endoscopy

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Pytorch implementation of SimSiam Architecture

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Async API for controlling Hue Lights

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.