Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Last update: Nov 12, 2022

Overview

Light-SERNet

This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition", submitted in ICASSP 2022.

In this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Run

1. Clone Repository

$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git
$ cd LIGHT-SERNET/

2. Requirements

Tensorflow >= 2.3.0
Numpy >= 1.19.2
Tqdm >= 4.50.2
Matplotlib> = 3.3.1
Scikit-learn >= 0.23.2

$ pip install -r requirements.txt

3. Data:

Download EMO-DB and IEMOCAP(requires permission to access) datasets
extract them in data folder

4. Prepare datasets :

Use the following code to convert each dataset to the desired size(second):

$ python utils/segment/segment_dataset.py -dp data/{dataset_folder} -ip utils/DATASET_INFO.json -d {datasetname_in_jsonfile} -l {desired_size(seconds)}

For example, for EMO-DB Dataset :

$ python utils/segment/segment_dataset.py -dp data/EMO-DB -ip utils/DATASET_INFO.json -d EMO-DB -l 3

5. Set hyperparameters and training config :

You only need to change the constants in the hyperparameters.py to set the hyperparameters and the training config.

6. Strat training:

Use the following code to train the model on the desired dataset with the desired cost function.

Note 1: The database name is the name of the database folder after segmentation.
Note 2: The results for the confusion matrix are saved in the result folder.

$ python train.py -dn {dataset_name_after_segmentation} -ln {cost_function_name}

For example, for EMO-DB Dataset :

$ python train.py -dn EMO-DB_3s_Segmented -ln focal

Citation

If you find our code useful for your research, please consider citing:

@article{aftab2021light,
  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},
  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},
  journal={arXiv preprint arXiv:2110.03435},
  year={2021}
}

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Related tags

Overview

Light-SERNet

Run

1. Clone Repository

2. Requirements

3. Data:

4. Prepare datasets :

5. Set hyperparameters and training config :

6. Strat training:

Citation

Owner

Arya Aftab

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

atmaCup #11 の Public 4th / Pricvate 5th Solution のリポジトリです。

Code examples and benchmarks from the paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective"

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

Fuzzing tool (TFuzz): a fuzzing tool based on program transformation

PyTorch GPU implementation of the ES-RNN model for time series forecasting

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

MT-GAN-PyTorch - PyTorch Implementation of Learning to Transfer: Unsupervised Domain Translation via Meta-Learning

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Official implementation of paper Gradient Matching for Domain Generalization

PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Progressive Image Deraining Networks: A Better and Simpler Baseline

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.