Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Last update: Dec 28, 2022

Related tags

Deep Learning ppg-vc

Overview

ppg-vc

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

This repo implements different kinds of PPG-based VC models. Pretrained models. More models are on the way.

Notes:

The PPG model provided in conformer_ppg_model is based on Hybrid CTC-Attention phoneme recognizer, trained with LibriSpeech (960hrs). PPGs have frame-shift of 10 ms, with dimensionality of 144. This modelis very much similar to the one used in this paper.
This repo uses HifiGAN V1 as the vocoder model, sampling rate of synthesized audio is 24kHz.

Highlights

Any-to-many VC
Any-to-Any VC (a.k.a. few/one-shot VC)

How to use

Data preprocessing

Please run 1_compute_ctc_att_bnf.py to compute PPG features.
Please run 2_compute_f0.py to compute fundamental frequency.
Please run 3_compute_spk_dvecs.py to compute speaker d-vectors.

Training

Please refer to run.sh

Conversion

Plesae refer to test.sh

TODO

Upload pretraind models.

Citations

@ARTICLE{liu2021any,
  author={Liu, Songxiang and Cao, Yuewen and Wang, Disong and Wu, Xixin and Liu, Xunying and Meng, Helen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling}, 
  year={2021},
  volume={29},
  number={},
  pages={1717-1728},
  doi={10.1109/TASLP.2021.3076867}
}

@inproceedings{Liu2018,
  author={Songxiang Liu and Jinghua Zhong and Lifa Sun and Xixin Wu and Xunying Liu and Helen Meng},
  title={Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={496--500},
  doi={10.21437/Interspeech.2018-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1504}
}

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Related tags

Overview

ppg-vc

Highlights

How to use

Data preprocessing

Training

Conversion

TODO

Citations

Owner

Liu Songxiang

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

Continuum Learning with GEM: Gradient Episodic Memory

This is an official implementation for "PlaneRecNet".

Python Interview Questions

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Alignment Attention Fusion framework for Few-Shot Object Detection

Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

A high-level Python library for Quantum Natural Language Processing

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

A library for researching neural networks compression and acceleration methods.

Head and Neck Tumour Segmentation and Prediction of Patient Survival Project

ML-Ensemble – high performance ensemble learning

Alphabetical Letter Recognition

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

PyTorch implementation of Pay Attention to MLPs