Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Last update: Nov 20, 2022

Related tags

Deep Learning LDNet

Overview

LDNet

Author: Wen-Chin Huang (Nagoya University) Email: [email protected]

This is the official implementation of the paper "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech". This is a model that takes an input synthetic speech sample and outputs the simulated human rating.

Usage

Currently we support only the VCC2018 dataset. We plan to release the BVCC dataset in the near future.

Requirements

PyTorch 1.9 (versions not too old should be fine.)
librosa
pandas
h5py
scipy
matplotlib
tqdm

Data preparation

# Download the VCC2018 dataset.
cd data
./download.sh vcc2018

Training

We provide configs that correspond to the following rows in the above figure:

(a): MBNet.yaml
(d): LDNet_MobileNetV3_RNN_5e-3.yaml
(e): LDNet_MobileNetV3_FFN_1e-3.yaml
(f): LDNet-MN_MobileNetV3_RNN_FFN_1e-3_lamb4.yaml
(g): LDNet-ML_MobileNetV3_FFN_1e-3.yaml

python train.py --config configs/<config_name> --tag <tag_name>

By default, the experimental results will be stored in exp/<tag_name>, including:

model-<steps>.pt: model checkpoints.
config.yml: the config file.
idtable.pkl: the dictionary that maps listener to ID.
training_<inference_mode>: the validation results generated along the training. This file is useful for model selection. Note that the inference_mode in the config file decides what mode is used during validation in the training.

There are some arguments that can be changed:

--exp_dir: The directory for storing the experimental results.
--data_dir: The data directory. Default is data/vcc2018.
seed: random seed.
update_freq: This is very important. See below.

Batch size and `update_freq`

By default, all LDNet models are trained with a batch size of 60. In my experiments, I used a single NVIDIA GeForce RTX 3090 with 24GB mdemory for training. I cannot fit the whole model in the GPU, so I accumulate gradients for update_freq forward passes and do one backward update. Before training, please check the train_batch_size in the config file, and set update_freq properly. For instance, in configs/LDNet_MobileNetV3_FFN_1e-3.yaml the train_batch_size is 20, so update_freq should be set to 3.

Inference

python inference.py --tag LDNet-ML_MobileNetV3_FFN_1e-3 --mode mean_listener

Use mode to specify which inference mode to use. Choices are: mean_net, all_listeners and mean_listener. By default, all checkpoints in the exp directory will be evaluated.

There are some arguments that can be changed:

ep: if you want to evaluate one model checkpoint, say, model-10000.pt, then simply pass --ep 10000.
start_ep: if you want to evaluate model checkpoints after a certain steps, say, 10000 steps later, then simply pass --start_ep 10000.

There are some files you can inspect after the evaluation:

<dataset_name>_<inference_mode>.csv: the validation and test set results.
<dataset_name>_<inference_mode>_<test/valid>/: figures that visualize the prediction distributions, including;
- <ep>_distribution.png: distribution over the score range (1-5).
- <ep>_utt_scatter_plot_utt: utterance-wise scatter plot of the ground truth and the predicted scores.
- <ep>_sys_scatter_plot_utt: system-wise scatter plot of the ground truth and the predicted scores.

Acknowledgement

This repository inherits from this great unofficial MBNet implementation.

Citation

If you find this recipe useful, please consider citing following paper:

@article{huang2021ldnet,
  title={LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech},
  author={Huang, Wen-Chin and Cooper, Erica and Yamagishi, Junichi and Toda, Tomoki},
  journal={arXiv preprint arXiv:2110.09103},
  year={2021}
}

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Related tags

Overview

LDNet

Usage

Requirements

Data preparation

Training

Batch size and `update_freq`

Inference

Acknowledgement

Citation

Owner

Wen-Chin Huang (unilight)

Defending against Model Stealing via Verifying Embedded External Features

This is a collection of all challenges in HKCERT CTF 2021

Build and run Docker containers leveraging NVIDIA GPUs

CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. ICCV 2021

My implementation of DeepMind's Perceiver

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

The dynamics of representation learning in shallow, non-linear autoencoders

Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

Neuron class provides LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neurons learned with Gradient descent or LeLevenberg–Marquardt algorithm

Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

salabim - discrete event simulation in Python

A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

Implementation of Neonatal Seizure Detection using EEG signals for deploying on edge devices including Raspberry Pi.

HINet: Half Instance Normalization Network for Image Restoration

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Related tags

Overview

LDNet

Usage

Requirements

Data preparation

Training

Batch size and update_freq

Inference

Acknowledgement

Citation

Owner

Wen-Chin Huang (unilight)

Defending against Model Stealing via Verifying Embedded External Features

This is a collection of all challenges in HKCERT CTF 2021

Build and run Docker containers leveraging NVIDIA GPUs

CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. ICCV 2021

My implementation of DeepMind's Perceiver

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

The dynamics of representation learning in shallow, non-linear autoencoders

Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

Neuron class provides LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neurons learned with Gradient descent or LeLevenberg–Marquardt algorithm

Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

salabim - discrete event simulation in Python

A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

Implementation of Neonatal Seizure Detection using EEG signals for deploying on edge devices including Raspberry Pi.

HINet: Half Instance Normalization Network for Image Restoration

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Batch size and `update_freq`