Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Last update: Nov 12, 2022

Overview

Light-SERNet

This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition", submitted in ICASSP 2022.

In this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Run

1. Clone Repository

$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git
$ cd LIGHT-SERNET/

2. Requirements

Tensorflow >= 2.3.0
Numpy >= 1.19.2
Tqdm >= 4.50.2
Matplotlib> = 3.3.1
Scikit-learn >= 0.23.2

$ pip install -r requirements.txt

3. Data:

Download EMO-DB and IEMOCAP(requires permission to access) datasets
extract them in data folder

4. Prepare datasets :

Use the following code to convert each dataset to the desired size(second):

$ python utils/segment/segment_dataset.py -dp data/{dataset_folder} -ip utils/DATASET_INFO.json -d {datasetname_in_jsonfile} -l {desired_size(seconds)}

For example, for EMO-DB Dataset :

$ python utils/segment/segment_dataset.py -dp data/EMO-DB -ip utils/DATASET_INFO.json -d EMO-DB -l 3

5. Set hyperparameters and training config :

You only need to change the constants in the hyperparameters.py to set the hyperparameters and the training config.

6. Strat training:

Use the following code to train the model on the desired dataset with the desired cost function.

Note 1: The database name is the name of the database folder after segmentation.
Note 2: The results for the confusion matrix are saved in the result folder.

$ python train.py -dn {dataset_name_after_segmentation} -ln {cost_function_name}

For example, for EMO-DB Dataset :

$ python train.py -dn EMO-DB_3s_Segmented -ln focal

Citation

If you find our code useful for your research, please consider citing:

@article{aftab2021light,
  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},
  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},
  journal={arXiv preprint arXiv:2110.03435},
  year={2021}
}

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Related tags

Overview

Light-SERNet

Run

1. Clone Repository

2. Requirements

3. Data:

4. Prepare datasets :

5. Set hyperparameters and training config :

6. Strat training:

Citation

Owner

Arya Aftab

Teaching end to end workflow of deep learning

N-Person-Check-Checker-Splitter - A calculator app use to divide checks

A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. ICCV 2021

LineBoard - Python+React+MySQL-白板即時系統改善人群行為

Mask-invariant Face Recognition through Template-level Knowledge Distillation

Self-Guided Contrastive Learning for BERT Sentence Representations

SBINN: Systems-biology informed neural network

Implementation of ConvMixer in TensorFlow and Keras

Pytorch implementation of "Forward Thinking: Building and Training Neural Networks One Layer at a Time"

Course about deep learning for computer vision and graphics co-developed by YSDA and Skoltech.

The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

Code for the paper "Attention Approximates Sparse Distributed Memory"

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Bridging Composite and Real: Towards End-to-end Deep Image Matting

This repository contains the scripts for downloading and validating scripts for the documents

Repository for the paper "Online Domain Adaptation for Occupancy Mapping", RSS 2020

TDmatch is a Python library developed to perform matching tasks in three categories:

Denoising Diffusion Probabilistic Models

An introduction to satellite image analysis using Python + OpenCV and JavaScript + Google Earth Engine