Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Last update: Dec 20, 2022

Related tags

Deep Learning StrengthNet

Overview

StrengthNet

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

https://arxiv.org/abs/2110.03156

Dependency

Ubuntu 18.04.5 LTS

GPU: Quadro RTX 6000
Driver version: 450.80.02
CUDA version: 11.0

Python 3.5

tensorflow-gpu 2.0.0b1 (cudnn=7.6.0)
scipy
pandas
matplotlib
librosa

Environment set-up

For example,

conda create -n strengthnet python=3.5
conda activate strengthnet
pip install -r requirements.txt
conda install cudnn=7.6.0

Usage

Run python utils.py to extract .wav to .h5;
Run python train.py to train a CNN-BLSTM based StrengthNet;

Evaluating new samples

Put the waveforms you wish to evaluate in a folder. For example, / /
Run python test.py --rootdir / /

This script will evaluate all the .wav files in / /, and write the results to / / /StrengthNet_result_raw.txt.

By default, the output/strengthnet.h5 pretrained model is used.

Citation

If you find this work useful in your research, please consider citing:

@misc{liu2021strengthnet,
      title={StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis}, 
      author={Rui Liu and Berrak Sisman and Haizhou Li},
      year={2021},
      eprint={2110.03156},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Resources

The ESD corpus is released by the HLT lab, NUS, Singapore.

The strength scores for the English samples of the ESD corpus are available here.

Acknowledgements:

MOSNet: https://github.com/lochenchou/MOSNet

Relative Attributes: Relative Attributes

License

This work is released under MIT License (see LICENSE file for details).

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Related tags

Overview

StrengthNet

Dependency

Environment set-up

Usage

Evaluating new samples

Citation

Resources

Acknowledgements:

License

Owner

RuiLiu

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

GrailQA: Strongly Generalizable Question Answering

Library for machine learning stacking generalization.

A PyTorch Toolbox for Face Recognition

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

Few-shot NLP benchmark for unified, rigorous eval

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

mmdetection version of TinyBenchmark.

Dilated Convolution with Learnable Spacings PyTorch

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)

Feature extraction made simple with torchextractor

The code of Zero-shot learning for low-light image enhancement based on dual iteration

ECLARE: Extreme Classification with Label Graph Correlations

Simulation of the solar system using various nummerical methods

Deep Learning Visuals contains 215 unique images divided in 23 categories

Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020.

Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format