Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Last update: Dec 20, 2022

Related tags

Deep Learning StrengthNet

Overview

StrengthNet

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

https://arxiv.org/abs/2110.03156

Dependency

Ubuntu 18.04.5 LTS

GPU: Quadro RTX 6000
Driver version: 450.80.02
CUDA version: 11.0

Python 3.5

tensorflow-gpu 2.0.0b1 (cudnn=7.6.0)
scipy
pandas
matplotlib
librosa

Environment set-up

For example,

conda create -n strengthnet python=3.5
conda activate strengthnet
pip install -r requirements.txt
conda install cudnn=7.6.0

Usage

Run python utils.py to extract .wav to .h5;
Run python train.py to train a CNN-BLSTM based StrengthNet;

Evaluating new samples

Put the waveforms you wish to evaluate in a folder. For example, / /
Run python test.py --rootdir / /

This script will evaluate all the .wav files in / /, and write the results to / / /StrengthNet_result_raw.txt.

By default, the output/strengthnet.h5 pretrained model is used.

Citation

If you find this work useful in your research, please consider citing:

@misc{liu2021strengthnet,
      title={StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis}, 
      author={Rui Liu and Berrak Sisman and Haizhou Li},
      year={2021},
      eprint={2110.03156},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Resources

The ESD corpus is released by the HLT lab, NUS, Singapore.

The strength scores for the English samples of the ESD corpus are available here.

Acknowledgements:

MOSNet: https://github.com/lochenchou/MOSNet

Relative Attributes: Relative Attributes

License

This work is released under MIT License (see LICENSE file for details).

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Related tags

Overview

StrengthNet

Dependency

Environment set-up

Usage

Evaluating new samples

Citation

Resources

Acknowledgements:

License

Owner

RuiLiu

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

A scikit-learn compatible neural network library that wraps PyTorch

Official implementation of UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

AutoML library for deep learning

MIRACLE (Missing data Imputation Refinement And Causal LEarning)

Code & Data for Enhancing Photorealism Enhancement

Edge Restoration Quality Assessment

Look Who’s Talking: Active Speaker Detection in the Wild

repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

Cookiecutter PyTorch Lightning

Convnet transfer - Code for paper How transferable are features in deep neural networks?

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Evaluation suite for large-scale language models.

Network Enhancement implementation in pytorch

Bilinear attention networks for visual question answering

Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Data augmentation for NLP, accepted at EMNLP 2021 Findings

Deep Markov Factor Analysis (NeurIPS2021)

An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wheat Detection (2021).