Audio Visual Emotion Recognition using TDA

Overview

Audio Visual Emotion Recognition using TDA

RAVDESS database with two datasets analyzed: Video and Audio dataset:

Audio-Dataset: https://www.kaggle.com/uwrfkaggler/ravdess-emotional-speech-audio

Video-Dataset: https://zenodo.org/record/1188976#.X7yio2hKjIU

The Final Master project PDF document is available here.

Folder Video_Dataset:

Dataset used is available in this url https://zenodo.org/record/1188976#.X7yio2hKjIU The algorithm works in this order:

  1. delaunay_construction.m: The first step of the algorithm in order to build the Delaunay triangulation in every video associated from dataset, remind that we have videos of 24 people and for each person 60 videos associated to 8 emotions. The first step is to defines the pathdata where it is the dataset address, that it is in format csv with the landmark point of the face. The coordinate of point X is is between position 2:297 and Y from 138:416 return the Delaunay_base, the struct that we will use in the code.

  2. complex_filtration.m: After get the delaunay_construction, we apply complex_filtration(Delaunay). The input is the Delaunay triangulation, in this code we built the complexes using the triangulation, taking the edges which form the squares and used them to form the square in every frame. We are working with 9 frames and this function calls the filtration function. Then, this function the return the complex asociated to each video, and the index position where each 3-cell is formed in the complex

2.1. filtrations.m This function obtains 8 border simplicial complexes filtered, from 4 view directions, 2 by each direction.We applied a set of function in order to get the different complex, as you can see the funcion return Complex X in the direction of axis X, Complex X in direction of Y, Complex XY, Complex YX in diagonal direction and the same complex with the order inverted.

2.2. complex_wtsquare.m In this function we are going to split the complexes which form every cell to see the features which born and died in the same square on the complex.

  1. WORKFLOW.m One time that we have the complexes build, we are going to apply the Incremental Algorithm (Persistence_new) used in this thesis, the Incremental algorithm was implemented in C++ using differente topology libraries which offer this language. Then we get the barcode or persistence diagram associated to each filter complex obtained at begining. In this function we apply also the function (per_entropy) to summarise the information from the persistence diagram

Load each complex and its index and apply:

3.1 complex2matrix.py: converts the complex obtained for the ATR model applied in matricial way as we explained on the thesis(page 50).

3.2 Persistence_new: ATR model defined in C++ to calculate the persisten homology and get the barcode or persistence diagrams associated with each filtration of the complex. The psuedo-code of the algorithm you will find on the thesis.

3.3 create_matrix.m: Built the different matrix based on persistence value to classify.

  1. experiment: the first experiment done based on the entropy values of video, but it sets each filtration compex that we get, then for that we worked with vector of eight elements associated to each filtration. Later this matrix is splitted in training and test set in order to use APP Classificator from Matlab and gets the accuracy.

  2. experiment3: Experiment that construct the matrix with the information of each persisten value associate with one filtration of the complex calculated. Later this matrix is splitted in training and test set in order to use APP Classificator from Matlab and gets the accuracy.

  3. feature24_vector.m: experiment done considering a vector of 24 features for each person. in this experiment we dont get good results.

Folder Audio Dataset:

In this url yo can finde the Audio-Dataset used for this implementation, the formal of the files are in .wav: https://www.kaggle.com/uwrfkaggler/ravdess-emotional-speech-audio

Experiment 1

  1. work_flow.py focuses on the first experiment, load data that will be used in the script, and initialize the dataframe to fill.

1.1 test.py using function emotions to get the embedder and duration in seconds of each audio signal. Read the audio and create the time array (timeline), resample the signal, get the optimal parameter delay, apply the emmbedding algorithm

1.2 get_parameters.py function to get the optimal parameter for taken embedding, which contains datDelayInformation for mutual information, false_nearest_neighours for embedding dimension.

1.3 TakensEmbedding: This function returns the Takens embedding of data with a delay into a dimension

1.4 per_entropy.py: Computes the persistence entropy of a set of intervals according to the diagrama obtained.

1.5 get_diagramas.py used to apply Vietoris-Rips filter and get the persisten_entropy values.

  1. machine_learning.py is used to define classification techniques in the set of entropy values. Create training and test splits. Import the KNeighborsClassifier from library. The parameter K is to plot in graph with corresponding error rate for dataset and calculate the mean of error for all the predicted values where K ranges from 1 to 40.

Experment 2

  1. Work_flow2.py: Second experiment, using function emotions_second to obtain the resampled signal, get_diag2 from test.py to calculates the Vietoris-Rips filter.

  2. machine_learning_second: To construct a distance matrix of persistence diagrams (Bottleneck distance). Upload the csv prueba5.csv that contains the label of the emotion associated to each rows of the matrix. Create the fake data matrix: just the indices of the timeseries. Import the KNeighborsClassifier from library. For evaluating the algorithm, confusion matrix, precision, recall and f1 score are the most commonly used. Testing different classifier to see what is the best one. GaussianNB; DecisionTreeClassifier, knn and SVC.

4.1 my_dist: To get the distance bottleneck between diagrams, function that we use to built the matrix of distance, that will be the input of the KNN algorithm.

Classification folder

In this folder, the persistent entropy matrixes and classification experiments using neural networks for video-only and audiovideo datasets are provided.

Owner
Combinatorial Image Analysis research group
Combinatorial Image Analysis research group
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding This repo contains the data and source code for baseline models in the NeurIPS 2

Microsoft 29 Dec 29, 2022
Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

longlongman 170 Dec 01, 2022
A curated list of awesome neural radiance fields papers

Awesome Neural Radiance Fields A curated list of awesome neural radiance fields papers, inspired by awesome-computer-vision. How to submit a pull requ

Yen-Chen Lin 3.9k Dec 27, 2022
Learning To Have An Ear For Face Super-Resolution

Learning To Have An Ear For Face Super-Resolution [Project Page] This repository contains demo code of our CVPR2020 paper. Training and evaluation on

50 Nov 16, 2022
Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

Compressive Visual Representations This repository contains the source code for our paper, Compressive Visual Representations. We developed informatio

Google Research 30 Nov 23, 2022
Hepsiburada - Hepsiburada Urun Bilgisi Cekme

Hepsiburada Urun Bilgisi Cekme from hepsiburada import Marka nike = Marka("nike"

Ilker Manap 8 Oct 26, 2022
Where2Act: From Pixels to Actions for Articulated 3D Objects

Where2Act: From Pixels to Actions for Articulated 3D Objects The Proposed Where2Act Task. Given as input an articulated 3D object, we learn to propose

Kaichun Mo 69 Nov 28, 2022
PyTorch implementation of SmoothGrad: removing noise by adding noise.

SmoothGrad implementation in PyTorch PyTorch implementation of SmoothGrad: removing noise by adding noise. Vanilla Gradients SmoothGrad Guided backpro

SSKH 143 Jan 05, 2023
Improving Machine Translation Systems via Isotopic Replacement

CAT (Improving Machine Translation Systems via Isotopic Replacement) Machine translation plays an essential role in people’s daily international commu

Zeyu Sun 10 Nov 30, 2022
Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

kNN_From_Scratch I implemented the k nearest neighbors (kNN) classification algorithm on python. This algorithm is used to predict the classes of new

1 Dec 14, 2021
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Neelesh C A 3 Oct 14, 2022
Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

Neural Wireframe Renderer: Learning Wireframe to Image Translations Pytorch implementation of ideas from the paper Neural Wireframe Renderer: Learning

Yuan Xue 7 Nov 14, 2022
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Cross Transformers - Pytorch (wip) Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch Install $ pip install cross-t

Phil Wang 40 Dec 22, 2022
Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

PiecewiseLinearTimeSeriesApproximation code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation, SIAM Data Mining 20

Daniel Lemire 21 Oct 27, 2022
3rd Place Solution of the Traffic4Cast Core Challenge @ NeurIPS 2021

3rd Place Solution of Traffic4Cast 2021 Core Challenge This is the code for our solution to the NeurIPS 2021 Traffic4Cast Core Challenge. Paper Our so

7 Jul 25, 2022
Hand Gesture Volume Control | Open CV | Computer Vision

Gesture Volume Control Hand Gesture Volume Control | Open CV | Computer Vision Use gesture control to change the volume of a computer. First we look i

Jhenil Parihar 3 Jun 15, 2022
The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition Boyan Zhou, Quan Cui, Xiu-Shen Wei*, Zhao-Min Chen This repo

Megvii-Nanjing 616 Dec 21, 2022
Exploration-Exploitation Dilemma Solving Methods

Exploration-Exploitation Dilemma Solving Methods Medium article for this repo - HERE In ths repo I implemented two techniques for tackling mentioned t

Aman Mishra 6 Jan 25, 2022
Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Query Variation Generators This repository contains the code and annotation data for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelin

Gustavo Penha 12 Nov 20, 2022
DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

DVG-Face: Dual Variational Generation for HFR This repo is a PyTorch implementation of DVG-Face: Dual Variational Generation for Heterogeneous Face Re

52 Dec 30, 2022