An evaluation toolkit for voice conversion models.

Overview

Voice-conversion-evaluation

An evaluation toolkit for voice conversion models.

Sample test pair

Generate the metadata for evaluating models.
The directory of parsers contains several available corpus parsers.

  python sampler.py [name of source corpus] [path of source dir] [name of target corpus] [path of target dir] -n [number of samples] -nt [number of target utterances] -o [path of output dir]

The pairs of metadata are sorted by src_second for long to short.
The metadata contains:

  • source_corpus: The name of the source corpus.
  • source_corpus_speaker_number: The number of speaker in source corpus.
  • source_random_seed: Random seed used for sampling source utterance.
  • target_corpus: The name of the target corpus.
  • target_corpus_speaker_number: The number of speaker in target corpus.
  • target_random_seed: Random seed used for sampling target utterances.
  • n_samples: number of samples
  • n_target_samples: number of target utterances
  • pairs: List of evaluating pairs
    • source_speaker: The name of the source speaker.
    • target_speaker: The name of the target speaker.
    • src_utt: The relative path of the source utterance, which is relative to the source dir.
    • tgt_utts: List of the relative path of target utterances, which is relative to the target dir.
    • content: The content of the source utterance.
    • src_second: The second of the source utterance.
    • converted: The entry does not appear when use sampler, you need to add the relative path for your converted output.

Metrics

The metrics include automatic mean opinion score assessment, character error rate, and speaker verification acceptance rate.

  • Automatic mean opinion score assessment
    • Ensemble several MBNet which is implemented by sky1456723.
      python calculate_objective_metric.py -d [data_dir] -r metrics/mean_opinion_score
    
  • Character error rate:
    • Use the automatic speech recognition model provided by Hugging Face.
    • The word error rate on Librispeech test-other is 3.9.
      python calculate_objective_metric.py -d [data_dir] -r metrics/character_error_rate
    
  • Speaker verification acceptance rate:
    • You can calculate the threshold by metrics/speaker_verification/equal_error_rate/.
    • And some pre-calculated thresholds are in metrics/speaker_verification/equal_error_rate/threshold.yaml.
      python calculate_objective_metric.py -d [data_dir] -r metrics/speaker_verification -t [target_dir] -th [threshold path]
    
You might also like...
Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).
Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.
The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Speech Separation The simple project to separate mixed voice (2 clean voices) to 2 separate voices. Result Example (Clisk to hear the voices): mix ||

Every Google, Azure & IBM text to speech voice for free

TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i

Releases(checkpoints)
This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

10 Dec 19, 2022
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 02, 2023
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

ArrowLuo 456 Jan 06, 2023
Final Project Bootcamp Zero

The Quest (Pygame) Descripción Este es el repositorio de código The-Quest para el proyecto final Bootcamp Zero de KeepCoding. El juego consiste en la

Seven-z01 1 Mar 02, 2022
Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

Tim Dettmers 586 Dec 24, 2022
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

David Drysdale 3.1k Dec 29, 2022
Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

SongNet SongNet: SongCi + Song (Lyrics) + Sonnet + etc. @inproceedings{li-etal-2020-rigid, title = "Rigid Formats Controlled Text Generation",

Piji Li 212 Dec 17, 2022
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipB

Jie Lei 雷杰 612 Jan 04, 2023
This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a

7 Sep 28, 2021
AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

AI Dynamic Text Reader: This is a simple dynamic text reader based on Artificial

Md. Rakibul Islam 1 Jan 18, 2022
InferSent sentence embeddings

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language in

Facebook Research 2.2k Dec 27, 2022
Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

Zeyao Du 5.6k Jan 04, 2023
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data Authors: Yi-Chang Chen, Yu-Chuan Chang, Yen-Cheng Chang and Yi-Ren Ye

Yi-Chang Chen 5 Dec 15, 2022
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 08, 2023
jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

jel: Japanese Entity Linker jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese. Usage Currently, link and question methods

izuna385 10 Jan 06, 2023
Speech Recognition Database Management with python

Speech Recognition Database Management The main aim of this project is to recogn

Abhishek Kumar Jha 2 Feb 02, 2022
Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

Andrea Cavallo 3 Jun 22, 2022
gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

决赛答辩已经过去一段时间了,我们队伍ac milan最终获得了复赛第3,决赛第4的成绩。在此首先感谢一些队友的carry~ 经过2个多月的比赛,学习收获了很多,也认识了很多大佬,在这里记录一下自己的参赛体验和学习收获。

102 Dec 19, 2022