SelfRemaster: SSL Speech Restoration

Last update: Jan 07, 2023

Overview

SelfRemaster: Self-Supervised Speech Restoration

Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Demo

Audio samples
Audio effect transfer with Gradio + HuggingFace Spaces 🤗

Setup

Clone this repository: git clone https://github.com/Takaaki-Saeki/ssl_speech_restoration.git
CD into this repository: cd ssl_speech_restoration
Install python packages and download some pretrained models: ./setup.sh

Getting started

If you use default Japanese corpora
- Download JSUT Basic5000 and JVS Corpus
- Downsample them to 22.05 kHz and Place them under data/ as jsut_22k and jvs_22k
- Place simulated low-quality data under ./data as jsut_22k-low and jvs_22k-low
Or you can use arbitrary datasets by modifying config files

Training

You can choose MelSpec or SourFilter models with --config_path option.
As shown in the paper, MelSpec model is of higher-quality.

Firstly you need to split the data to train/val/test and dump them by the following command.

python preprocess.py --config_path configs/train/${feature}/ssl_jsut.yaml

To perform self-supervised learning with dual learning, run the following command.

python train.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, refer to train.py.

Speech restoration

To perform speech restoration of the test data, run the following command.

python eval.py \
    --config_path configs/test/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, see eval.py.

Audio effect transfer

You can run a simple audio effect transfer demo using a model pretrained with real data.
Run the following command.

python aet_demo.py

Or you can customize the dataset or model.
You need to edit audio_effect_transfer.yaml and run the following command.

python aet.py \
    --config_path configs/test/melspec/audio_effect_transfer.yaml \
    --stage ssl-dual \
    --run_name aet_melspec_dual

For other options, see aet.py.

Pretrained models

See here.

Reproducing results

You can generate simulated low-quality data as in the paper with the following command.

python simulated_data.py \
    --in_dir ${input_directory (e.g., path to jsut_22k)} \
    --output_dir ${output_directory (e.g., path to jsut_22k-low)} \
    --corpus_type ${single-speaker corpus or multi-speaker corpus} \
    --deg_type lowpass

Then download the pretrained model correspond to the deg_type and run the following command.

python eval.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

Citation

@article{saeki22selfremaster,
  title={{SelfRemaster}: {S}elf-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling},
  author={T. Saeki and S. Takamichi and T. Nakamura and N. Tanji and H. Saruwatari},
  journal={arXiv preprint arXiv:2203.12937},
  year={2022}
}

SelfRemaster: SSL Speech Restoration

Related tags

Overview

SelfRemaster: Self-Supervised Speech Restoration

Demo

Setup

Getting started

Training

Speech restoration

Audio effect transfer

Pretrained models

Reproducing results

Citation

Reference

Owner

Takaaki Saeki

Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Joint parameterization and fitting of stroke clusters

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

TensorFlow implementation of Elastic Weight Consolidation

Codes and Data Processing Files for our paper.

An inofficial PyTorch implementation of PREDATOR based on KPConv.

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Vision Deep-Learning using Tensorflow, Keras.

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Custom implementation of Corrleation Module

Real-time pose estimation accelerated with NVIDIA TensorRT

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

codes for Image Inpainting with External-internal Learning and Monochromic Bottleneck

Self-Supervised Image Denoising via Iterative Data Refinement

Convolutional Neural Network for Text Classification in Tensorflow

Libraries, tools and tasks created and used at DeepMind Robotics.

A PyTorch implementation of DenseNet.

Real-Time Semantic Segmentation in Mobile device

DeepLab2: A TensorFlow Library for Deep Labeling