Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Last update: Dec 28, 2022

Related tags

Deep Learning ppg-vc

Overview

ppg-vc

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

This repo implements different kinds of PPG-based VC models. Pretrained models. More models are on the way.

Notes:

The PPG model provided in conformer_ppg_model is based on Hybrid CTC-Attention phoneme recognizer, trained with LibriSpeech (960hrs). PPGs have frame-shift of 10 ms, with dimensionality of 144. This modelis very much similar to the one used in this paper.
This repo uses HifiGAN V1 as the vocoder model, sampling rate of synthesized audio is 24kHz.

Highlights

Any-to-many VC
Any-to-Any VC (a.k.a. few/one-shot VC)

How to use

Data preprocessing

Please run 1_compute_ctc_att_bnf.py to compute PPG features.
Please run 2_compute_f0.py to compute fundamental frequency.
Please run 3_compute_spk_dvecs.py to compute speaker d-vectors.

Training

Please refer to run.sh

Conversion

Plesae refer to test.sh

TODO

Upload pretraind models.

Citations

@ARTICLE{liu2021any,
  author={Liu, Songxiang and Cao, Yuewen and Wang, Disong and Wu, Xixin and Liu, Xunying and Meng, Helen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling}, 
  year={2021},
  volume={29},
  number={},
  pages={1717-1728},
  doi={10.1109/TASLP.2021.3076867}
}

@inproceedings{Liu2018,
  author={Songxiang Liu and Jinghua Zhong and Lifa Sun and Xixin Wu and Xunying Liu and Helen Meng},
  title={Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={496--500},
  doi={10.21437/Interspeech.2018-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1504}
}

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Related tags

Overview

ppg-vc

Highlights

How to use

Data preprocessing

Training

Conversion

TODO

Citations

Owner

Liu Songxiang

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

Help you understand Manual and w/ Clutch point while driving.

Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

DimReductionClustering - Dimensionality Reduction + Clustering + Unsupervised Score Metrics

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Code and data for the paper "Hearing What You Cannot See"

[CVPR 2021] MiVOS - Scribble to Mask module

Official repo for SemanticGAN https://nv-tlabs.github.io/semanticGAN/

PyTorch - Python + Nim

Modelisation on galaxy evolution using PEGASE-HR

LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Decorators for maximizing memory utilization with PyTorch & CUDA

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

PyTorch implementation of "Dataset Knowledge Transfer for Class-Incremental Learning Without Memory" (WACV2022)

The Environment I built to study Reinforcement Learning + Pokemon Showdown

Bling's Object detection tool

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers