Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Last update: Dec 07, 2022

Related tags

Deep Learning WadaIN-VC

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

python 3.6.0
pytorch 1.4.0
pyyaml 5.4.1
numpy 1.19.5
librosa 0.8.0
soundfile 0.10.2
tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.
If you wanna run our project, please install as the description of ParallelWaveGAN project first.
And then prepare all the mel-spectrogram data as ParallelWaveGAN do.
Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.
Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation

Neighborhood Reconstructing Autoencoders

This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

ADOP: Approximate Differentiable One-Pixel Point Rendering

LaBERT - A length-controllable and non-autoregressive image captioning model.

State of the art Semantic Sentence Embeddings

Cookiecutter PyTorch Lightning

2D&3D human pose estimation

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Multi-Task Learning as a Bargaining Game

MVSDF - Learning Signed Distance Field for Multi-view Surface Reconstruction

'Aligned mixture of latent dynamical systems' (amLDS) for stimulus decoding probabilistic manifold alignment across animals. P. Herrero-Vidal et al. NeurIPS 2021 code.

Implementation of ConvMixer in TensorFlow and Keras

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

This repository contains all source code, pre-trained models related to the paper "An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator"

Face Mask Detection on Image and Video using tensorflow and keras

Code in PyTorch for the convex combination linear IAF and the Householder Flow, J.M. Tomczak & M. Welling

Facestar dataset. High quality audio-visual recordings of human conversational speech.

custom pytorch implementation of MoCo v3