Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

Overview

Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

This is a simple audio classification api build to classify the sound of an audio, weather it is the cat or dog sound.

alt

Response

Given a .wav audio the model will classify what does the sound the audio belongs to either cat or dog.

{
  "predictions": {
    "class": "dog",
    "label": 1,
    "probability": 1.0
  },
  "success": true
}

Starting the server

To start server and start audio classification first you need to make sure you are in the server folder and run the following commands:

  1. creating a virtual environment
virtualenv venv && .\venv\Scripts\activate.bat
  1. installing packages
pip install -r requirements.txt
  1. Starting the server
python api/app.py

The server will start on a default port of 3001 and you will be able to make api request to the server to do audio classification.

Model Metrics

The following table shows all the metrics summary we get after training the model for few 15 epochs.

model name model description test accuracy validation accuracy train accuracy test loss validation loss train loss
cats-dogs-sound-cnn.pt audio sentiment classification for dogs and cats CNN. 90.7% 90.7% 93.5% 0.621 0.218 0.209

Classification report

The following is the classification report for the model on the test dataset.

# precision recall f1-score support
accuracy - - 90% 2305
macro avg 91% 90% 90% 2305
weighted avg 92% 89% 90% 2305

Confusion matrix

The following figure shows a confusion matrix for the classification model.

Audio Sentiment classification

If you hit the server at http://localhost:3001/classify you will be able to get the following expected response that is if the request method is POST and you provide the file expected by the server.

Expected Response

The expected response at http://localhost:3001/classify with a file audio of the right format will yield the following json response to the client.

{
  "predictions": {
    "class": "dog",
    "label": 1,
    "probability": 1.0
  },
  "success": true
}

Using curl

Make sure that you have the audio named cat.wav in the current folder that you are running your cmd otherwise you have to provide an absolute or relative path to the audio.

To make a curl POST request at http://localhost:3001/classify with the file cat.wav we run the following command.

# for cat
curl -X POST -F [email protected] http://127.0.0.1:3001/classify

# for dog
curl -X POST -F [email protected] http://127.0.0.1:3001/classify

Using Postman client

To make this request with postman we do it as follows:

  1. Change the request method to POST at http://127.0.0.1:3001/classify
  2. Click on form-data
  3. Select type to be file on the KEY attribute
  4. For the KEY type audio and select the audio you want to predict under value
  5. Click send

If everything went well you will get the following response depending on the face you have selected:

{
  "predictions": { "class": "dog", "label": 1, "probability": 1.0 },
  "success": true
}

Using JavaScript fetch api.

  1. First you need to get the input from html
  2. Create a formData object
  3. make a POST requests
res.json()) .then((data) => console.log(data));">
const input = document.getElementById("input").files[0];
let formData = new FormData();
formData.append("audio", input);
fetch("http://127.0.0.1:3001/classify", {
  method: "POST",
  body: formData,
})
  .then((res) => res.json())
  .then((data) => console.log(data));

If everything went well you will be able to get expected response.

{
  "predictions": { "class": "dog", "label": 1, "probability": 1.0 },
  "success": true
}

Notebooks

  • All notebooks for training and saving the models are found in the notebooks folder of this repository.
Owner
crispengari
ai || software development. (creator of initialiseur)
crispengari
Backdoor Attack through Frequency Domain

Backdoor Attack through Frequency Domain DEPENDENCIES python==3.8.3 numpy==1.19.4 tensorflow==2.4.0 opencv==4.5.1 idx2numpy==1.2.3 pytorch==1.7.0 Data

5 Jun 18, 2022
Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

Meta Research 99 Dec 06, 2022
Paper list of log-based anomaly detection

Paper list of log-based anomaly detection

Weibin Meng 411 Dec 05, 2022
PyTorch implementation of DCT fast weight RNNs

DCT based fast weights This repository contains the official code for the paper: Training and Generating Neural Networks in Compressed Weight Space. T

Kazuki Irie 4 Dec 24, 2022
Vision Transformer and MLP-Mixer Architectures

Vision Transformer and MLP-Mixer Architectures Update (2.7.2021): Added the "When Vision Transformers Outperform ResNets..." paper, and SAM (Sharpness

Google Research 6.4k Jan 04, 2023
Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung

Vending_Machine_(Mesin_Penjual_Minuman) Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung Raw Sketch untuk Essay Ringkasan P

QueenLy 1 Nov 08, 2021
Hybrid Neural Fusion for Full-frame Video Stabilization

FuSta: Hybrid Neural Fusion for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 430 Jan 04, 2023
Resilience from Diversity: Population-based approach to harden models against adversarial attacks

Resilience from Diversity: Population-based approach to harden models against adversarial attacks Requirements To install requirements: pip install -r

0 Nov 23, 2021
[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Panoptic Segmentation Forecasting Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021 [Link to paper] We propose

Niantic Labs 44 Nov 29, 2022
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution πŸ“œ Technical report πŸ—¨οΈ Presentation πŸŽ‰ Announcement πŸ›†Motion Prediction Channel Website πŸ›†

158 Jan 08, 2023
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds PCAM: Product of Cross-Attention Matrices for Rigid Registration of P

valeo.ai 24 May 31, 2022
A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation

A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation This repository contains the source code of the paper A Differentiable

Bernardo Aceituno 2 May 05, 2022
Medical image analysis framework merging ANTsPy and deep learning

ANTsPyNet A collection of deep learning architectures and applications ported to the python language and tools for basic medical image processing. Bas

Advanced Normalization Tools Ecosystem 118 Dec 24, 2022
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

Keon Lee 114 Dec 12, 2022
Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

MixFaceNets This is the official repository of the paper: MixFaceNets: Extremely Efficient Face Recognition Networks. (Accepted in IJCB2021) https://i

Fadi Boutros 51 Dec 13, 2022
Pcos-prediction - Predicts the likelihood of Polycystic Ovary Syndrome based on patient attributes and symptoms

PCOS Prediction πŸ₯Ό Predicts the likelihood of Polycystic Ovary Syndrome based on

Samantha Van Seters 1 Jan 10, 2022
Neural style transfer in PyTorch.

style-transfer-pytorch An implementation of neural style transfer (A Neural Algorithm of Artistic Style) in PyTorch, supporting CPUs and Nvidia GPUs.

Katherine Crowson 395 Jan 06, 2023
A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

LPM_Python A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching. The code is established ac

AoxiangFan 11 Nov 07, 2022
INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing Existing studies on semantic parsing focus primarily on mapping a natural-la

7 Aug 22, 2022
TensorFlow implementation of "Learning from Simulated and Unsupervised Images through Adversarial Training"

Simulated+Unsupervised (S+U) Learning in TensorFlow TensorFlow implementation of Learning from Simulated and Unsupervised Images through Adversarial T

Taehoon Kim 569 Dec 29, 2022