Identify the emotion of multiple speakers in an Audio Segment

Overview

PR MIT License made-with-python


Logo

MevonAI - Speech Emotion Recognition

Identify the emotion of multiple speakers in a Audio Segment
Report Bug ยท Request Feature

Try the Demo Here

Open In Colab

Table of Contents

About The Project

Logo

The main aim of the project is to Identify the emotion of multiple speakers in a call audio as a application for customer satisfaction feedback in call centres.

Built With

Getting Started

Follow the Below Instructions for setting the project up on your local Machine.

Installation

  1. Create a python virtual environment
sudo apt install python3-venv
mkdir mevonAI
cd mevonAI
python3 -m venv mevon-env
source mevon-env/bin/activate
  1. Clone the repo
git clone https://github.com/SuyashMore/MevonAI-Speech-Emotion-Recognition.git
  1. Install Dependencies
cd MevonAI-Speech-Emotion-Recognition/
cd src/
sudo chmod +x setup.sh
./setup.sh

Running the Application

  1. Add audio files in .wav format for analysis in src/input/ folder

  2. Run Speech Emotion Recognition using

python3 speechEmotionRecognition.py
  1. By Default , the application will use the Pretrained Model Available in "src/model/"

  2. Diarized files will be stored in "src/output/" folder

  3. Predicted Emotions will be stored in a separate .csv file in src/ folder

Here's how it works:

Speaker Diarization

  • Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speakerโ€™s true identity. It is used to answer the question "who spoke when?" Speaker diarisation is a combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics.

Logo

Feature Extraction

  • When we do Speech Recognition tasks, MFCCs is the state-of-the-art feature since it was invented in the 1980s.This shape determines what sound comes out. If we can determine the shape accurately, this should give us an accurate representation of the phoneme being produced. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum, and the job of MFCCs is to accurately represent this envelope.

Logo

The Above Image represents the audio Waveform , the below image shows the converted MFCC Output on which we will Run our CNN Model.

CNN Model

  • Use Convolutional Neural Network to recognize emotion on the MFCCs with the following Architecture
model = Sequential()

#Input Layer
model.add(Conv2D(32, 5,strides=2,padding='same',
                 input_shape=(13,216,1)))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Hidden Layer 1
model.add(Conv2D(64, 5,strides=2,padding='same',))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Hidden Layer 2
model.add(Conv2D(64, 5,strides=2,padding='same',))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Flatten Conv Net
model.add(Flatten())

#Output Layer
model.add(Dense(7))
model.add(Activation('softmax'))

Training the Model

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

FAQ

  • How do I do specifically so and so?
    • Create an Issue to this repo , we will respond to the query
commonfate ๐Ÿ“ฆcommonfate ๐Ÿ“ฆ - Common Fate Model and Transform.

Common Fate Transform and Model for Python This package is a python implementation of the Common Fate Transform and Model to be used for audio source

Fabian-Robert Stรถter 18 Jan 08, 2022
Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS)

Tradutor_MIDI-RISC-V Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS) *O resultado sai com essa formataรงรฃo: nota,duraรงรฃo,nota,d

Gabriel B. G. 4 Sep 02, 2022
๐ŸŽต A repository for manually annotating files to create labeled acoustic datasets for machine learning.

๐ŸŽต A repository for manually annotating files to create labeled acoustic datasets for machine learning.

Jim Schwoebel 28 Dec 22, 2022
The project aims to develop a personal-assistant for Windows & Linux-based systems

The project aims to develop a personal-assistant for Windows & Linux-based systems. Samiksha draws its inspiration from virtual assistants like Cortana for Windows, and Siri for iOS. It has been desi

SHUBHANSHU RAI 1 Jan 16, 2022
a library for audio and music analysis

aubio aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at wh

aubio 2.9k Dec 30, 2022
A Quick Music Player Made Fully in Python

Quick Music Player Made Fully In Python. Pure Python, cross platform, single function module with no dependencies for playing sounds. Installation & S

1 Dec 24, 2021
Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Sadew Jayasekara 15 Nov 12, 2022
Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose We provide PyTorch implementations for our arxiv paper "Audio-dr

Ran Yi 497 Jan 09, 2023
MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

MetaBrainz Foundation 3k Dec 31, 2022
Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm.

LPC_for_TTS Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm. ๅŸบไบŽLevinson-Durbin

Zewang ZHANG 58 Nov 17, 2022
Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

LAKH MuseNet MIDI Dataset Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums) Bonus: Choir on Channel 10 Please CC

Alex 6 Nov 20, 2022
python wrapper for rubberband

pyrubberband A python wrapper for rubberband. For now, this just provides lightweight wrappers for pitch-shifting and time-stretching. All processing

Brian McFee 106 Nov 28, 2022
GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

Bytedance Inc. 1.3k Jan 04, 2023
Code for paper 'Audio-Driven Emotional Video Portraits'.

Audio-Driven Emotional Video Portraits [CVPR2021] Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu [Project] [Paper] G

197 Dec 31, 2022
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Meinard Mueller 66 Jan 02, 2023
A Simple Script that will help you to Play / Change Songs with just your Voice

Auto-Spotify using Voice Recognition A Simple Script that will help you to Play / Change Songs with just your Voice Explore the docs ยป Table of Conten

Mehul Shah 1 Nov 21, 2021
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

Audiovisual Communications Laboratory 1k Jan 09, 2023
Audio2midi - Automatic Audio-to-symbolic Arrangement

Automatic Audio-to-symbolic Arrangement This is the repository of the project "A

Ziyu Wang 24 Dec 05, 2022
A python wrapper for REAPER

pyreaper A python wrapper for REAPER (Robust Epoch And Pitch EstimatoR) Installation pip install pyreaper Demonstration notebnook http://nbviewer.jupy

Ryuichi Yamamoto 56 Dec 27, 2022
Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

Bruno Gola 34 Jun 29, 2022