An evaluation toolkit for voice conversion models.

Last update: Aug 29, 2022

Overview

Voice-conversion-evaluation

An evaluation toolkit for voice conversion models.

Sample test pair

Generate the metadata for evaluating models.
The directory of parsers contains several available corpus parsers.

  python sampler.py [name of source corpus] [path of source dir] [name of target corpus] [path of target dir] -n [number of samples] -nt [number of target utterances] -o [path of output dir]

The pairs of metadata are sorted by src_second for long to short.
The metadata contains:

source_corpus: The name of the source corpus.
source_corpus_speaker_number: The number of speaker in source corpus.
source_random_seed: Random seed used for sampling source utterance.
target_corpus: The name of the target corpus.
target_corpus_speaker_number: The number of speaker in target corpus.
target_random_seed: Random seed used for sampling target utterances.
n_samples: number of samples
n_target_samples: number of target utterances
pairs: List of evaluating pairs
- source_speaker: The name of the source speaker.
- target_speaker: The name of the target speaker.
- src_utt: The relative path of the source utterance, which is relative to the source dir.
- tgt_utts: List of the relative path of target utterances, which is relative to the target dir.
- content: The content of the source utterance.
- src_second: The second of the source utterance.
- converted: The entry does not appear when use sampler, you need to add the relative path for your converted output.

Metrics

The metrics include automatic mean opinion score assessment, character error rate, and speaker verification acceptance rate.

Automatic mean opinion score assessment
- Ensemble several MBNet which is implemented by sky1456723.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/mean_opinion_score
```
Character error rate:
- Use the automatic speech recognition model provided by Hugging Face.
- The word error rate on Librispeech test-other is 3.9.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/character_error_rate
```
Speaker verification acceptance rate:
- You can calculate the threshold by metrics/speaker_verification/equal_error_rate/.
- And some pre-calculated thresholds are in metrics/speaker_verification/equal_error_rate/threshold.yaml.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/speaker_verification -t [target_dir] -th [threshold path]
```

Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

3 Jun 20, 2022

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

5 Nov 4, 2022

Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

22 Nov 16, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

103 Dec 23, 2022

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统，包含语音编码器、语音合成器、声码器和可视化模块。

6 Nov 8, 2022

Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

25.6k Jan 6, 2023

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Speech Separation The simple project to separate mixed voice (2 clean voices) to 2 separate voices. Result Example (Clisk to hear the voices): mix ||

31 Oct 30, 2022

Every Google, Azure & IBM text to speech voice for free

TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i

16 Dec 7, 2022

An evaluation toolkit for voice conversion models.

Related tags

Overview

Voice-conversion-evaluation

Sample test pair

Metrics

You might also like...

Installation, test and evaluation of Scribosermo speech-to-text engine

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Common Voice Dataset explorer

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Clone a voice in 5 seconds to generate arbitrary speech in real-time

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Every Google, Azure & IBM text to speech voice for free

Releases(checkpoints)

checkpoints(May 17, 2021)

Owner

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

Anuvada: Interpretable Models for NLP using PyTorch

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

A Python script which randomly chooses and prints a file from a directory.

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Knowledge Oriented Programming Language

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Finetune gpt-2 in google colab

Journey is a NLP-Powered Developer assistant

Blazing fast language detection using fastText model

Python3 to Crystal Translation using Python AST Walker

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

LCG T-TEST USING EUCLIDEAN METHOD

Share constant definitions between programming languages and make your constants constant again

Creating a chess engine using GPT-3