An evaluation toolkit for voice conversion models.

Last update: Aug 29, 2022

Overview

Voice-conversion-evaluation

An evaluation toolkit for voice conversion models.

Sample test pair

Generate the metadata for evaluating models.
The directory of parsers contains several available corpus parsers.

  python sampler.py [name of source corpus] [path of source dir] [name of target corpus] [path of target dir] -n [number of samples] -nt [number of target utterances] -o [path of output dir]

The pairs of metadata are sorted by src_second for long to short.
The metadata contains:

source_corpus: The name of the source corpus.
source_corpus_speaker_number: The number of speaker in source corpus.
source_random_seed: Random seed used for sampling source utterance.
target_corpus: The name of the target corpus.
target_corpus_speaker_number: The number of speaker in target corpus.
target_random_seed: Random seed used for sampling target utterances.
n_samples: number of samples
n_target_samples: number of target utterances
pairs: List of evaluating pairs
- source_speaker: The name of the source speaker.
- target_speaker: The name of the target speaker.
- src_utt: The relative path of the source utterance, which is relative to the source dir.
- tgt_utts: List of the relative path of target utterances, which is relative to the target dir.
- content: The content of the source utterance.
- src_second: The second of the source utterance.
- converted: The entry does not appear when use sampler, you need to add the relative path for your converted output.

Metrics

The metrics include automatic mean opinion score assessment, character error rate, and speaker verification acceptance rate.

Automatic mean opinion score assessment
- Ensemble several MBNet which is implemented by sky1456723.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/mean_opinion_score
```
Character error rate:
- Use the automatic speech recognition model provided by Hugging Face.
- The word error rate on Librispeech test-other is 3.9.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/character_error_rate
```
Speaker verification acceptance rate:
- You can calculate the threshold by metrics/speaker_verification/equal_error_rate/.
- And some pre-calculated thresholds are in metrics/speaker_verification/equal_error_rate/threshold.yaml.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/speaker_verification -t [target_dir] -th [threshold path]
```

Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

3 Jun 20, 2022

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

5 Nov 4, 2022

Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

22 Nov 16, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

103 Dec 23, 2022

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统，包含语音编码器、语音合成器、声码器和可视化模块。

6 Nov 8, 2022

Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

25.6k Jan 6, 2023

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Speech Separation The simple project to separate mixed voice (2 clean voices) to 2 separate voices. Result Example (Clisk to hear the voices): mix ||

31 Oct 30, 2022

Every Google, Azure & IBM text to speech voice for free

TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i

16 Dec 7, 2022

An evaluation toolkit for voice conversion models.

Related tags

Overview

Voice-conversion-evaluation

Sample test pair

Metrics

You might also like...

Installation, test and evaluation of Scribosermo speech-to-text engine

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Common Voice Dataset explorer

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Clone a voice in 5 seconds to generate arbitrary speech in real-time

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Every Google, Azure & IBM text to speech voice for free

Releases(checkpoints)

checkpoints(May 17, 2021)

Owner

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

Espial is an engine for automated organization and discovery of personal knowledge

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

Graphical user interface for Argos Translate

💫 Industrial-strength Natural Language Processing (NLP) in Python

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

DeepPavlov Tutorials

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)