Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Last update: Nov 18, 2022

Related tags

Overview

MOSNet

pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352

Dependency

Linux Ubuntu 20.04

GPU: GeForce RTX 2080 Ti
CUDA version: 10.0

Python 3.7

pytorch==1.4.0
numpy==1.19.5
tqdm
scipy==1.6.2
pandas==1.2.4
matplotlib
librosa==0.6.0

Usage

Reproducing results in the paper

cd ./data and run bash download.sh to download the VCC2018 evaluation results and submitted speech. (downsample the submitted speech might take some times)
Run python mos_results_preprocess.py to prepare the evaluation results. (Run python bootsrap_estimation.py to do the bootstrap experiment for intrinsic MOS calculation)
Run python utils.py to extract .wav to .h5
Run python train.py -c config.json to train a CNN-BLSTM version of MOSNet.
Run python test.py -c config.json --epoch BEST_EPOCH --is_fp16 to test a CNN-BLSTM version of MOSNet.

Note

Thanks to the authors of the paper MOSNet and the code is based on their tensorflow implementation https://github.com/lochenchou/MOSNet. However, my workstation will show OOM errors even with BATCH_SIZE=4 under tensorflow2.0 and RTX 2080 Ti. Therefore I implement the code with pytorch. Currently only 7700MiB memory is used when BATCH_SIZE=64. If you find any problem with my code, you can write a issue.

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{mosnet,
  author={Lo, Chen-Chou and Fu, Szu-Wei and Huang, Wen-Chin and Wang, Xin and Yamagishi, Junichi and Tsao, Yu and Wang, Hsin-Min},
  title={MOSNet: Deep Learning based Objective Assessment for Voice Conversion},
  year=2019,
  booktitle={Proc. Interspeech 2019},
}

License

This work is released under MIT License (see LICENSE file for details).

VCC2018 Database & Results

The model is trained on the large listening evaluation results released by the Voice Conversion Challenge 2018.
The listening test results can be downloaded from here
The databases and results (submitted speech) can be downloaded from here

Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Related tags

Overview

MOSNet

Dependency

Usage

Reproducing results in the paper

Note

Citation

License

VCC2018 Database & Results

Owner

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

基于pytorch构建cyclegan示例

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

A scikit-learn compatible neural network library that wraps PyTorch

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Eye-Blink-Counter - Python based Computer Vision project which counts how many time a person blinks

CAUSE: Causality from AttribUtions on Sequence of Events

Papers about explainability of GNNs

A Joint Video and Image Encoder for End-to-End Retrieval

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Course on computational design, non-linear optimization, and dynamics of soft systems at UIUC.

tf2-keras implement yolov5

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

AI grand challenge 2020 Repo (Speech Recognition Track)

Python script to download the celebA-HQ dataset from google drive

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Groceries ARL: Association Rules (Birliktelik Kuralı)