MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

Last update: Dec 28, 2022

Related tags

Overview

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wayne Wu, Chen Qian, Ran He, Yu Qiao, Chen Change Loy.

Introduction

This repository is for our ECCV2020 paper MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation.

Multi-view Emotional Audio-visual Dataset

To cope with the challenge of realistic and natural emotional talking face genertaion, we build the Multi-view Emotional Audio-visual Dataset (MEAD) which is a talking-face video corpus featuring 60 actors and actresses talking with 8 different emotions at 3 different intensity levels. High-quality audio-visual clips are captured at 7 different view angles in a strictly-controlled environment. Together with the dataset, we also release an emotional talking-face generation baseline which enables the manipulation of both emotion and its intensity. For more specific information about the dataset, please refer to here.

Installation

This repository is based on Pytorch, so please follow the official instructions in here. The code is tested under pytorch1.0 and Python 3.6 on Ubuntu 16.04.

Usage

Training set & Testing set Split

Please refer to the Section 6 "Speech Corpus of Mead" in the supplementary material. The speech corpora are basically divided into 3 parts, (i.e., common, generic, and emotion-related). For each intensity level, we directly use the last 10 sentences of neutral category and the last 6 sentences of the other seven emotion categories as the testing set. Note that all the sentences in the testing set come from the "emotion-related" part. Meanwhile if you are trying to manipulate the emotion category, you can use all the 40 sentences of neutral category as the input samples.

Training

Download the dataset from here. We package the audio-visual data of each actor in a single folder named after "MXXX" or "WXXX", where "M" and "W" indicate actor and actress, respectively.
As Mead requires different modules to achieve different functions, thus we seperate the training for Mead into three stages. In each stage, the corresponding configuration (.yaml file) should be set up accordingly, and used as below:

Stage 1: Audio-to-Landmarks Module

cd Audio2Landmark
python train.py --config config.yaml

Stage 2: Neutral-to-Emotion Transformer

cd Neutral2Emotion
python train.py --config config.yaml

Stage 3: Refinement Network

cd Refinement
python train.py --config config.yaml

Testing

First, download the pretrained models and put them in models folder.
Second, download the demo audio data.
Run the following command to generate a talking sequence with a specific emotion

cd Refinement
python demo.py --config config_demo.yaml

You can try different emotions by replacing the number with other integers from 0~7.

0:angry
1:disgust
2:contempt
3:fear
4:happy
5:sad
6:surprised
7:neutral

In addition, you can also try compound emotion by setting up two different emotions at the same time.

The results are stored in outputs folder.

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{kaisiyuan2020mead,
 author = {Wang, Kaisiyuan and Wu, Qianyi and Song, Linsen and Yang, Zhuoqian and Wu, Wayne and Qian, Chen and He, Ran and Qiao, Yu and Loy, Chen Change},
 title = {MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation},
 booktitle = {ECCV},
 month = Augest,
 year = {2020}
}

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

Related tags

Overview

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

Introduction

Multi-view Emotional Audio-visual Dataset

Installation

Usage

Training set & Testing set Split

Training

Stage 1: Audio-to-Landmarks Module

Stage 2: Neutral-to-Emotion Transformer

Stage 3: Refinement Network

Testing

Citation

Owner

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

This is a python script to navigate and extract the FSD50K dataset

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

Data collection, enhancement, and metrics calculation.

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

The Dash Enterprise App Gallery "Oil & Gas Wells" example

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

A forecasting system dedicated to smart city data

Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Evaluation of a Monocular Eye Tracking Set-Up

CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

small package with utility functions for analyzing (fly) calcium imaging data

collect training and calibration data for gaze tracking

Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

Single-Cell Analysis in Python. Scales to >1M cells.

Methylation/modified base calling separated from basecalling.

This python script allows you to manipulate the audience data from Sl.ido surveys