Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

Overview

CycleGAN-VC3-PyTorch

standard-readme compliant Donate

中文说明 | English


This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion, a nice work on Voice-Conversion/Voice Cloning.

  • Dataset
    • VC
  • Usage
    • Training
    • Example
  • Demo
  • Reference

CycleGAN-VC3

Project Page

Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, CycleGAN-VC [3] and CycleGAN-VC2 [2] have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram.

network comparison Figure 1. We developed time-frequency adaptive normalization (TFAN), which extends instance normalization [5] so that the affine parameters become element-dependent and are determined according to an entire input mel-spectrogram.


This repository contains:

  1. TFAN module code which implemented the TFAN module
  2. model code which implemented the model network.
  3. audio preprocessing script you can use to create cache for training data.
  4. training scripts to train the model.

Table of Contents


Requirement

pip install -r requirements.txt

Usage


Reference

  1. CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion. Paper, Project
  2. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Paper, Project
  3. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. Paper, Project
  4. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Paper, Project, Code
  5. Image-to-Image Translation with Conditional Adversarial Nets. Paper, Project, Code

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

ali_pay

WechatPay(微信)

wechat_pay

paypal


License

MIT © Kun

Owner
Kun Ma
artizan
Kun Ma
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Environments Effi

Weirui Ye 671 Jan 03, 2023
Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

66 Dec 15, 2022
Implicit Deep Adaptive Design (iDAD)

Implicit Deep Adaptive Design (iDAD) This code supports the NeurIPS paper 'Implicit Deep Adaptive Design: Policy-Based Experimental Design without Lik

Desi 12 Aug 14, 2022
Videocaptioning.pytorch - A simple implementation of video captioning

pytorch implementation of video captioning recommend installing pytorch and pyth

Yiyu Wang 2 Jan 01, 2022
BMVC 2021 Oral: code for BI-GCN: Boundary-Aware Input-Dependent Graph Convolution for Biomedical Image Segmentation

BMVC 2021 BI-GConv: Boundary-Aware Input-Dependent Graph Convolution for Biomedical Image Segmentation Necassary Dependencies: PyTorch 1.2.0 Python 3.

Yanda Meng 15 Nov 08, 2022
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Winnie Xu 95 Nov 26, 2021
Equivariant layers for RC-complement symmetry in DNA sequence data

Equi-RC Equivariant layers for RC-complement symmetry in DNA sequence data This is a repository that implements the layers as described in "Reverse-Co

7 May 19, 2022
MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space

Update (20 Jan 2020): MODALS on text data is avialable MODALS MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space Table of Conte

38 Dec 15, 2022
nanodet_plus,yolov5_v6.0

OAK_Detection OAK设备上适配nanodet_plus,yolov5_v6.0 Environment pytorch = 1.7.0

炼丹去了 1 Feb 18, 2022
Continuum Learning with GEM: Gradient Episodic Memory

Gradient Episodic Memory for Continual Learning Source code for the paper: @inproceedings{GradientEpisodicMemory, title={Gradient Episodic Memory

Facebook Research 360 Dec 27, 2022
Reimplementation of Learning Mesh-based Simulation With Graph Networks

Pytorch Implementation of Learning Mesh-based Simulation With Graph Networks This is the unofficial implementation of the approach described in the pa

Jingwei Xu 33 Dec 14, 2022
Fashion Entity Classification

Fashion-Entity-Classification - Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grays

ADITYA SHAH 1 Jan 04, 2022
Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process

Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process, a complete algorithm library is esta

Fu Pengyou 50 Jan 07, 2023
Demo project for real time anomaly detection using kafka and python

kafkaml-anomaly-detection Project for real time anomaly detection using kafka and python It's assumed that zookeeper and kafka are running in the loca

Rodrigo Arenas 36 Dec 12, 2022
chainladder - Property and Casualty Loss Reserving in Python

chainladder (python) chainladder - Property and Casualty Loss Reserving in Python This package gets inspiration from the popular R ChainLadder package

Casualty Actuarial Society 130 Dec 07, 2022
The official implementation of Theme Transformer

Theme Transformer This is the official implementation of Theme Transformer. Checkout our demo and paper : Demo | arXiv Environment: using python versi

Ian Shih 85 Dec 08, 2022
COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping Version 1.0 COVINS is an accurate, scalable, and versatile vis

ETHZ V4RL 183 Dec 27, 2022
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

75 Dec 02, 2022
Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

MLP-Mixer Pytorch reimplementation of Google's repository for the MLP-Mixer (Not yet updated on the master branch) that was released with the paper ML

Eunkwang Jeon 18 Dec 08, 2022
This is a model made out of Neural Network specifically a Convolutional Neural Network model

This is a model made out of Neural Network specifically a Convolutional Neural Network model. This was done with a pre-built dataset from the tensorflow and keras packages. There are other alternativ

9 Oct 18, 2022