Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Last update: Dec 16, 2022

Related tags

Text Data & NLP FastVocoder

Overview

Fast (GAN Based Neural) Vocoder

Chinese README

Todo

Submit demo
Support NHV

Discription

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe include NHV in the future. Developed on BiaoBei dataset, you can modify conf and hparams.py to fit your own dataset and model.

Usage

Prepare data
- write path of wav data in a file, for example: cd dataset && python3 biaobei.py
- bash preprocess.sh <wav path file> <path to save processed data> dataset/audio dataset/mel
- for example: bash preprocess.sh dataset/BZNSYP.txt processed dataset/audio dataset/mel

Train

command:

bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    <if multi band> \
    <if use scheduler> \
    <path to configuration file>

for example:

bash train.sh \
0 \
dataset/audio/train \
dataset/audio/valid \
dataset/mel/train \
dataset/mel/valid \
hifigan \
0 0 0 \
conf/hifigan/light.yaml

Train from checkpoint

command:

bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    <if multi band> \
    <if use scheduler> \
    <path to configuration file> \
    /path/to/checkpoint \
    <step of checkpoint>

Synthesize

command:

bash synthesize.sh \
    /path/to/checkpoint \
    /path/to/mel \
    /path/for/saving/wav \
    <model name> \
    /path/to/configuration/file

Acknowledgments

Comments

why set the L=30 ?

hello，I have some question， in the paper ，the shape of basis matrix is [32, 256] , but in the code ,the shape is [30, 256] . And according to the function "overlap_and_add" , output_size = (frames - 1) * frame_step + frame_length, if the L=30, I think it cannot match the real wave length ? for example, hop_len=256, mel.shape=[80, 140] , theoretically the output wave length is 140*256=35840. according to the code, the output wave length is 33600.

Thanks in advance.

opened by yingfenging 3
Link to Basis-MelGAN paper?

Hi Zhengxi, congrats on your paper's acceptance on Interspeech 2021!

I got pretty interested in your paper while reading the abstract of Basis-MelGAN on the README, but I could not find any link to the paper. Though the Interspeech conference is only 2 months away, don't you have any plans on publishing the paper on arXiv in near future?

opened by seungwonpark 2
Random start index in WeightDataset

At this line: https://github.com/xcmyz/FastVocoder/blob/a9af370be896b1096e746ce6489fb16fef8ca585/data/dataset.py#L97

If the input mel size smaller than fix-length, the random raise issue, I have try except to pass these short audios, but I just wonder it is handle in collate.

More than that, the segment size as I found in hifigan is 32, but in basic-melgan it (fix-length) is set to 140. Are there any difference between the 140 for biaobei and the one for LJspeech

opened by v-nhandt21 0
can basis-melgan be used as unversial vocoder?

I tried it for a single speaker dataset, rtf surprises me. Have you ever use basis-melgan for a multi-speaker dataset, or is it suitable for unseen speaker tts synthesis?

opened by mayfool 0
Shape mismatch error on new dataset
Hi, thanks for your work!

The frame rate of my dataset is 22050, and hop size of text2mel model is 256. I have changed hparams.py accordingly, but training results in an expcetion: (preprocessing was fine, anyway)

File "/home/user/speechlab/FastVocoder-main/model/loss/loss.py", line 23, in forward assert est_source_sub_band.size(1) == wav_sub_band.size(1)

I figured out that model inference still uses hop-size of 240. So how to make your code fully compatible with other datasets? it seems that the codes are somehow hardcoded for Biaobei dataset.
opened by tekinek 1
Multiband Architecture

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks.
help wanted

opened by Rongjiehuang 6

Releases(v1.0)

v1.0(Jun 24, 2021)

Source code(tar.gz)
Source code(zip)
basis.melgan.pt(53.36 MB)

Owner

Zhengxi Liu (刘正曦)

Interested in high performance neural vocoder and expressive TTS acoustic model. Member of DeepMist and developed MistGPU.

GitHub Repository

Code and data accompanying Natural Language Processing with PyTorch

Natural Language Processing with PyTorch Build Intelligent Language Applications Using Deep Learning By Delip Rao and Brian McMahan Welcome. This is a

1.8k Jan 01, 2023

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

ZUNIT Dependencies you can install all the dependencies by pip install -r requirements.txt Datasets Download CUB dataset. Unzip the birds.zip at ./da

9 Jun 24, 2022

Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

79 Nov 29, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

🤗 Contributing to OpenSpeech 🤗 OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

513 Jan 03, 2023

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

6.2k Dec 31, 2022

Stand-alone language identification system

langid.py readme Introduction langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained

2k Jan 04, 2023

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

2017 VQA Challenge Winner (CVPR'17 Workshop) pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challeng

166 Dec 11, 2022

A paper list for aspect based sentiment analysis.

Aspect-Based-Sentiment-Analysis A paper list for aspect based sentiment analysis. Survey [IEEE-TAC-20]: Issues and Challenges of Aspect-based Sentimen

419 Dec 20, 2022

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

117 Jan 07, 2023

Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

112 Dec 05, 2022

Korean Sentence Embedding Repository

Korean-Sentence-Embedding 🍭 Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides

80 Jan 02, 2023

Open solution to the Toxic Comment Classification Challenge

Starter code: Kaggle Toxic Comment Classification Challenge More competitions 🎇 Check collection of public projects 🎁 , where you can find multiple

153 Jun 22, 2022

Outreachy TFX custom component project

Schema Curation Custom Component Outreachy TFX custom component project This repo contains the code for Schema Curation Custom Component made as a par

5 Jul 16, 2021

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

12.3k Jan 02, 2023

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

4 Jun 17, 2022

Contract Understanding Atticus Dataset

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

273 Dec 17, 2022

🏆 • 5050 most frequent words in 109 languages

🏆 Most Common Words Multilingual 5000 most frequent words in 109 languages. Uses wordfrequency.info as a source. 🔗 License source code license data

14 Nov 24, 2022

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

8 Nov 09, 2022

End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

4 Nov 06, 2022

Collection of scripts to pinpoint obfuscated code

Obfuscation Detection (v1.0) Author: Tim Blazytko Automatically detect control-flow flattening and other state machines Description: Scripts and binar

230 Nov 26, 2022