An 16kHz implementation of HiFi-GAN for soft-vc.

Last update: Dec 27, 2022

Overview

HiFi-GAN

An 16kHz implementation of HiFi-GAN for soft-vc.

Relevant links:

Example Usage

import torch
import numpy as np

# Load checkpoint
hifigan = torch.hub.load("bshall/hifigan:main", "hifigan_hubert_soft").cuda()
# Load mel-spectrogram
mel = torch.from_numpy(np.load("path/to/mel")).unsqueeze(0).cuda()
# Generate
wav, sr = hifigan.generate(mel)

Train

Step 1: Download and extract the LJ-Speech dataset

Step 2: Resample the audio to 16kHz:

usage: resample.py [-h] [--sample-rate SAMPLE_RATE] in-dir out-dir

Resample an audio dataset.

positional arguments:
  in-dir                path to the dataset directory
  out-dir               path to the output directory

optional arguments:
  -h, --help            show this help message and exit
  --sample-rate SAMPLE_RATE
                        target sample rate (default 16kHz)

Step 3: Download the dataset splits and move them into the root of the dataset directory. After steps 2 and 3 your dataset directory should look like this:

LJSpeech-1.1
│   test.txt
│   train.txt
│   validation.txt
├───mels
└───wavs

Note: the mels directory is optional. If you want to fine-tune HiFi-GAN the mels directory should contain ground-truth aligned spectrograms from an acoustic model.

Step 4: Train HiFi-GAN:

usage: train.py [-h] [--resume RESUME] [--finetune] dataset-dir checkpoint-dir

Train or finetune HiFi-GAN.

positional arguments:
  dataset-dir      path to the preprocessed data directory
  checkpoint-dir   path to the checkpoint directory

optional arguments:
  -h, --help       show this help message and exit
  --resume RESUME  path to the checkpoint to resume from
  --finetune       whether to finetune (note that a resume path must be given)

Generate

To generate using the trained HiFi-GAN models, see Example Usage or use the generate.py script:

usage: generate.py [-h] [--model-name {hifigan,hifigan-hubert-soft,hifigan-hubert-discrete}] in-dir out-dir

Generate audio for a directory of mel-spectrogams using HiFi-GAN.

positional arguments:
  in-dir                path to directory containing the mel-spectrograms
  out-dir               path to output directory

optional arguments:
  -h, --help            show this help message and exit
  --model-name {hifigan,hifigan-hubert-soft,hifigan-hubert-discrete}
                        available models

Acknowledgements

This repo is based heavily on https://github.com/jik876/hifi-gan.

You might also like...

Fast Soft Color Segmentation

3 Oct 29, 2022

Permute Me Softly: Learning Soft Permutations for Graph Representations

7 Jul 10, 2022

Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

8 Sep 30, 2022

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

CPDeform Code and data for paper Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics at ICLR 2022 (Spotlight). @InProceed

29 Nov 29, 2022

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

1.5k Jan 2, 2023

Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

TransGanFormer (wip) Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GansFormer and TransGan paper. I

146 Dec 6, 2022

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

72 Dec 31, 2022

This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

GAN Memory for Lifelong learning This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting. Please consider citing our paper

43 Dec 27, 2022

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs In this work, we propose a framework HijackGAN, which enables non-linear latent space travers

46 Sep 5, 2022

Comments

is pretrained weight of discriminator of base model available?

Thanks for nice work. @bshall

I'm trying to train hifigan now, but it takes so long training it from scratch using other dataset.

If discriminator of base model is also available, I could start finetuning based on that vocoder. it seems that you released only generator. Could you also release discriminator weights?

opened by seastar105 3
NaN during training when using own dataset
While fine-tuning works as expected, doing regular training with a dataset that isn't LJSpeech would eventually cause a NaN loss at some point. The culprit appears to be the following line, which causes a division by zero if wav happens to contain perfect silence:

https://github.com/bshall/hifigan/blob/374a4569eae5437e2c80d27790ff6fede9fc1c46/hifigan/dataset.py#L106

I'm not sure what the best solution for this would be, as a quick fix I simply clipped the divisor so it can't reach zero:

wav = flip * gain * wav / max([wav.abs().max(), 0.001])
opened by cjay42 0
How to use this Vocoder with your Tacotron?

Thank you for your work. I used your Tacotron in your Universal Vocoding.The quality of the speech is excellent. However, the inference speed is slow. for that reason, I would like to use this hifigan as a vocoder. But Tacotron's n_mel is 80, while hifigan's n_mel is 128. How to use hifigan with Tacotron?

opened by gheyret 0

Releases(v0.1)

v0.1(Oct 17, 2021)

HiFi-GAN vocoders fine-tuned on the HuBERT-Soft and HuBERT-Discrete voice conversion systems.
Source code(tar.gz)
Source code(zip)
dev.txt(1.17 KB)
hifigan-67926ec6.pt(54.89 MB)
hifigan-hubert-discrete-bbad3043.pt(54.89 MB)
hifigan-hubert-soft-65f03469.pt(54.89 MB)
test.txt(1.17 KB)
train.txt(151.17 KB)

Owner

Benjamin van Niekerk

PhD student at Stellenbosch University. Interested in speech and audio technology.

GitHub Repository https://bshall.github.io/soft-vc/

A CNN model to detect hand gestures.

Software Used python - programming language used, tested on v3.8 miniconda - for managing virtual environment Libraries Used opencv - pip install open

6 Jul 14, 2022

Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

225 Dec 13, 2022

An easy-to-use app to visualise attentions of various VQA models.

Ask Me Anything: A tool for visualising Visual Question Answering (AMA) An easy-to-use app to visualise attentions of various VQA models. Please click

37 Nov 13, 2022

A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

IconQA About IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and c

24 Dec 30, 2022

working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

Music Demixing Challenge - xumx-sliCQ This repository is the GitHub mirror of my working submission repository for the AICrowd ISMIR 2021 Music Demixi

4 Aug 25, 2021

A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

25 Oct 14, 2022

A little Python application to auto tag your photos with the power of machine learning.

Tag Machine A little Python application to auto tag your photos with the power of machine learning. Report a bug or request a feature Table of Content

14 Dec 21, 2022

This repo generates the training data and the model for Morpheus-Deblend

Morpheus-Deblend This repo generates the training data and the model for Morpheus-Deblend. This is the active development repo for the project and as

2 Apr 18, 2022

Self Governing Neural Networks (SGNN): the Projection Layer

Self Governing Neural Networks (SGNN): the Projection Layer A SGNN's word projections preprocessing pipeline in scikit-learn In this notebook, we'll u

22 Nov 06, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 04, 2023

Wenzhou-Kean University AI-LAB

AI-LAB This is Wenzhou-Kean University AI-LAB. Our research interests are in Computer Vision and Natural Language Processing. Computer Vision Please g

10 May 05, 2022

Yolo ros - YOLO-ROS for HUAWEI ATLAS200

YOLO-ROS YOLO-ROS for NVIDIA YOLO-ROS for HUAWEI ATLAS200, please checkout for b

5 Oct 18, 2022

IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020)

This repo is the official implementation of our paper "Instance Adaptive Self-training for Unsupervised Domain Adaptation". The purpose of this repo is to better communicate with you and respond to y

[email protected]"> 84 Dec 12, 2022

Construct a neural network frame by Numpy

本项目的CSDN博客链接：https://blog.csdn.net/weixin_41578567/article/details/111482022 1. 概览本项目主要用于神经网络的学习，通过基于numpy的实现，了解神经网络底层前向传播、反向传播以及各类优化器的原理。该项目目前已实现的功

24 Jan 22, 2022

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

140 Nov 23, 2022

Crowd-sourced Annotation of Human Motion.

Motion Annotation Tool Live: https://motion-annotation.humanoids.kit.edu Paper: The KIT Motion-Language Dataset Installation Start by installing all P

4 May 25, 2020

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification Usage The required packages are lis

0 Feb 07, 2022

Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

80 Sep 17, 2022

SmoothGrad implementation in PyTorch

SmoothGrad implementation in PyTorch PyTorch implementation of SmoothGrad: removing noise by adding noise. Vanilla Gradients SmoothGrad Guided backpro

143 Jan 05, 2023

This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

Pytorch Medical Segmentation Read Chinese Introduction：Here！ Recent Updates 2021.1.8 The train and test codes are released. 2021.2.6 A bug in dice was

618 Dec 27, 2022

An 16kHz implementation of HiFi-GAN for soft-vc.

Related tags

Overview

HiFi-GAN

Example Usage

Train

Generate

Acknowledgements

You might also like...

Fast Soft Color Segmentation

Permute Me Softly: Learning Soft Permutations for Graph Representations

Multi-task Multi-agent Soft Actor Critic for SMAC

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Comments

is pretrained weight of discriminator of base model available?

NaN during training when using own dataset

How to use this Vocoder with your Tacotron?

Releases(v0.1)

v0.1(Oct 17, 2021)

Owner

Benjamin van Niekerk

A CNN model to detect hand gestures.

Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

An easy-to-use app to visualise attentions of various VQA models.

A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

A little Python application to auto tag your photos with the power of machine learning.

This repo generates the training data and the model for Morpheus-Deblend

Self Governing Neural Networks (SGNN): the Projection Layer

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Wenzhou-Kean University AI-LAB

Yolo ros - YOLO-ROS for HUAWEI ATLAS200

IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020)

Construct a neural network frame by Numpy

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

Crowd-sourced Annotation of Human Motion.

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

SmoothGrad implementation in PyTorch

This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.