VoiceFixer VoiceFixer is a framework for general speech restoration.

Last update: Jan 06, 2023

Related tags

Overview

VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech.

Paper

⚠️ We submit this paper to ICLR2022. Preprint on arxiv will be available before Oct.03 2021!

Usage

⚠️ Still working on it, stay tuned! Expect to be available before 2021.09.30.

Environment

# Download dataset and prepare running environment
source init.sh

Train from scratch

Let's take VF_UNet(voicefixer with unet as analysis module) as an example. Other model have the similar training and evaluation logic.

cd general_speech_restoration/voicefixer/unet
source run.sh

After that, you will get a log directory that look like this

├── unet
│   └── log
│       └── 2021-09-27-xxx
│           └── version_0
│               └── checkpoints
                    └──epoch=1.ckpt
│               └── code

Evaluation

Automatic evaluation and generate .csv file for the results.

cd general_speech_restoration/voicefixer/unet
# Basic usage
python3 handler.py  -c <str, path-to-checkpoint> \
                    -t <str, testset> \ 
                    -l <int, limit-utterance-number> \ 
                    -d <str, description of this evaluation> \

For example, if you like to evaluate on all testset. And each testset you intend to limit the number to 10 utterance.

python3 handler.py  -c  log/2021-09-27-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -l  10 \ 
                    -d  ten_utterance_for_each_testset \

There are generally seven testsets:

base: all testset
clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
reverb: testset with reverberate speech
general_speech_restoration: testset with speech that contain all kinds of random distortions
enhancement: testset with noisy speech
speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

Demo

Demo page

Demo page contains comparison between single task speech restoration, general speech restoration, and voicefixer.

Pip package

We wrote a pip package for voicefixer.

Colab

You can try voicefixer using your own voice on colab!

Project Structure

.
├── dataloaders 
│   ├── augmentation # code for speech data augmentation.
│   └── dataloader # code for different kinds of dataloaders.
├── datasets 
│   ├── datasetParser # code for preparing each dataset
│   └── se # Dataset for speech enhancement (source init.sh)
│       ├── RIR_44k # Room Impulse Response 44.1kHz
│       │   ├── test
│       │   └── train
│       ├── TestSets # Evaluation datasets
│       │   ├── ALL_GSR # General speech restoration testset
│       │   │   ├── simulated
│       │   │   └── target
│       │   ├── DECLI # Speech declipping testset
│       │   │   ├── 0.1 # Different clipping threshold
│       │   │   ├── 0.25
│       │   │   ├── 0.5
│       │   │   └── GroundTruth
│       │   ├── DENOISE # Speech enhancement testset
│       │   │   └── vd_test
│       │   │       ├── clean_testset_wav
│       │   │       └── noisy_testset_wav
│       │   ├── DEREV # Speech dereverberation testset
│       │   │   ├── GroundTruth
│       │   │   └── Reverb_Speech
│       │   └── SR # Speech super resolution testset
│       │       ├── GroundTruth
│       │       └── cheby1
│       │           ├── 1000 # Different cutoff frequencies
│       │           ├── 12000
│       │           ├── 2000
│       │           ├── 4000
│       │           └── 8000
│       ├── vd_noise # Noise training dataset
│       └── wav48 # Speech training dataset
│           ├── test # Not used, included for completeness
│           └── train 
├── evaluation # The code for model evaluation
├── exp_results # The Folder that store evaluation result (in handler.py).
├── general_speech_restoration # GSR 
│   ├── unet # GSR_UNet
│   │   └── model_kqq_lstm_mask_gan
│   └── voicefixer # Each folder contains the training entry for each model.
│       ├── dnn # VF_DNN
│       ├── lstm # VF_LSTM
│       ├── unet # VF_UNet
│       └── unet_small # VF_UNet_S
├── resources 
├── single_task_speech_restoration # SSR
│   ├── declip_unet # Declip_UNet
│   ├── derev_unet # Derev_UNet
│   ├── enh_unet # Enh_UNet
│   └── sr_unet # SR_UNet
├── tools
└── callbacks

Citation

⚠️ Will be available once the paper is ready.

VoiceFixer VoiceFixer is a framework for general speech restoration.

Related tags

Overview

VoiceFixer

Paper

Usage

Environment

Train from scratch

Evaluation

Demo

Demo page

Pip package

Colab

Project Structure

Citation

Owner

Leo

Autoregressive Entity Retrieval

Code for "Finetuning Pretrained Transformers into Variational Autoencoders"

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Final Project Bootcamp Zero

Machine learning models from Singapore's NLP research community

chaii - hindi & tamil question answering

Russian GPT3 models.

Backend for the Autocomplete platform. An AI assisted coding platform.

Code associated with the Don't Stop Pretraining ACL 2020 paper

Ukrainian TTS (text-to-speech) using Coqui TTS

Задания КЕГЭ по информатике 2021 на Python

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

A desktop GUI providing an audio interface for GPT3.

Converts python code into c++ by using OpenAI CODEX.

KoBERT - Korean BERT pre-trained cased (KoBERT)

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR