A whale detector design for the Kaggle whale-detector challenge!

Overview

CNN (InceptionV1) + STFT based Whale Detection Algorithm

So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The objective of this challenge was to basically do a binary classification, (hence really a detection), on the existance of whale signals in the water.

It's a pretty cool problem that resonates with prior work I have done in underwater perception algorithm design - a freakishly hard problem I may add. (The speed of sound changes on you, multiple reflections from the environment, but probably the hardest of all being that it's hard to gather ground-truth). (<--- startup idea? 💥 )

Anyway! My approach is to first transform the 1D acoustic time-domain signal into a 2D time-frequency representation via the Short-Time-Fourier-Transform (STFT). We do this in the following way:

(Where K_F is the raw number of STFT frequency bands, n is the discrete time index, m is the temporal index of each STFT pixel, x[n] the raw audio signal being transformed, and k representing the index of each STFT pixel's frequency). In this way, we break the signal down into it's constituent time-frequency energy cells, (which are now pixels), but more crucially, we get a representation that has distinct features across time and frequency that will be correlated with each other. This then makes it ripe for a Convolutional Neural Network (CNN) to chew into.

Here is what a whale-signal's STFT looks like:

Pos whale spectrogram

Similarly, here's what a signal's STFT looks like without any whale signal. (Instead, there seems to be some short-time but uber wide band interference at some point in time).

Neg whale spectrogram

It's actually interesting, because there are basically so many more ways in which a signal can manifest itself as not a whale signal, VS as actually being a whale signal. Does that mean we can also frame the problem as learning the manifold of whale-signals and simply do outlier analysis on that? Something to think about. :)

Code Usage:

Ok - let us now talk about how to use the code:

The first thing you need to do is install PyTorch of course. Do this from here. I use a conda environment as they recommend, and I recommend you do the same.

Once this is done, activate your PyTorch environment.

Now we need to download the raw data. You can get that from Kaggle's site here. Unzip this data at a directory of your choosing. For the purpose of this tutorial, I am going to assume that you placed and unzipped the data as such: /Users/you/data/whaleData/. (We will only be using the training data so that we can split it into train/val/test. The reason is that we do not have access to Kaggle's test labels).

We are now going to do the following steps:

  • Convert the audio files into numpy STFT tensors:
    • python whaleDataCreatorToNumpy.py -s 1 -dataDir /Users/you/data/whaleData/train/ -labelcsv /Users/you/data/whaleData/train.csv -dataDirProcessed /Users/you/data/whaleData/processedData/ -ds 0.42 -rk 20 200
    • The -s 1 flag says we want to save the results, the -ds 0.42 says we want to downsample the STFT image by this amount, (to help with computation time), and the -rk 20 200 says that we want the "rows kept" to be indexed from 20 to 200. This is because the STFT is conjugate symmetric, but also because we make a determination by first swimming in the data, (I swear this pun is not intentional), that most of the informational content lies between those bands. (Again, the motivation is computational here as well).
  • Convert and split the STFT tensors into PyTorch training/val/test Torch tensors:
    • python whaleDataCreatorNumpyToTorchTensors.py -numpyDataDir /Users/you/data/whaleData/processedData/
    • Here, the original numpy tensors are first split and normalized, and then saved off into PyTorch tensors. (The split percentages are able to be user defined, I set the defaults set 20% for validation and 10% test). The PyTorch tensors are saved in the same directory as above.
  • Run the CNN classifier!
    • We are now ready to train the classifier! I have already designed an Inception-V1 CNN architecture, that can be loaded up automatically, and we can use this as so. The input dimensions are also guaranteed to be equal to the STFT image sizes here. At any rate, we do this like so:
    • python whaleClassifier.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -g 0 -e 1 -lr 0.0002 -L2 0.01 -mb 4 -dp 0 -s 3 -dnn 'inceptionModuleV1_75x45'
    • The g term controls whether or not we want to use a GPU to trian, e controls the number of epochs we want to train over, lr is the learning rate, L2 is the L2 penalization amount for regularization, mb is the minibatch size, (which will be double this as the training composes a mini-batch to have an equal number of positive and negative samples), dp controls data parallelism (moot without multiple GPUs, and is really just a flag on whether or not to use multiple GPUs), s controls when and how often we save the net weights and validation losses, (option 3 saves the best performing model), and finally, -dnn is a flag that controls which DNN architecture we want to use. In this way, you can write your own DNN arch, and then simply call it by whatever name you give it for actual use. (I did this after I got tired of hard-coding every single DNN I designed).
    • If everything is running smoothly, you should see something like this as training progresses:
    • The "time" here just shows how long it takes between the reporting of each validation score. (Since I ran this on my CPU, it's 30 seconds / report, but expect this to be at least an order of magnitude faster on a respectable GPU).
  • Evauluate the results!
    • When your training is complete, you can then then run this script to give you automatically generated ROC and PR curves for your network's performance:
    • python resultsVisualization.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -netDir .
    • After a good training session, you should get results that look like so:
    • I also show the normalized training / validation likelihoods and accuracies for the duration of the session:

So wow! An AUC of 0.9669! Not too shabby! Can still be improved, but considering the data looks like this below, our InceptionV1-CNN isn't doing too bad either. 💥

Owner
Tarin Ziyaee
Eng Manager @Facebook FRL neural interfaces | Director R&D @CTRL-labs neural inferfaces. | CTO @Voyage, autonomous vehicles | Perception @Apple Autonomous
Tarin Ziyaee
Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Cycle Consistent Adversarial Domain Adaptation (CyCADA) A pytorch implementation of CyCADA. If you use this code in your research please consider citi

Hyunwoo Ko 2 Jan 10, 2022
Code of the lileonardo team for the 2021 Emotion and Theme Recognition in Music task of MediaEval 2021

Emotion and Theme Recognition in Music The repository contains code for the submission of the lileonardo team to the 2021 Emotion and Theme Recognitio

Vincent Bour 8 Aug 02, 2022
Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Text Summarization WCN — Weighted Contextual N-gram method for evaluation of Text Summarization In this project, I fine tune T5 model on Extreme Summa

Aditya Shah 1 Jan 03, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
A PyTorch-centric hybrid classical-quantum machine learning framework

torchquantum A PyTorch-centric hybrid classical-quantum dynamic neural networks framework. News Add a simple example script using quantum gates to do

MIT HAN Lab 400 Jan 02, 2023
The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Openspoor The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch

7 Aug 22, 2022
Official pytorch implement for “Transformer-Based Source-Free Domain Adaptation”

Official implementation for TransDA Official pytorch implement for “Transformer-Based Source-Free Domain Adaptation”. Overview: Result: Prerequisites:

stanley 54 Dec 22, 2022
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this

Nervana 3.9k Dec 20, 2022
Multi-Scale Progressive Fusion Network for Single Image Deraining

Multi-Scale Progressive Fusion Network for Single Image Deraining (MSPFN) This is an implementation of the MSPFN model proposed in the paper (Multi-Sc

Kuijiang 128 Nov 21, 2022
A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

mocap4face by Facemoji mocap4face by Facemoji is a free, multiplatform SDK for real-time facial motion capture based on Facial Action Coding System or

Facemoji 591 Dec 27, 2022
a baseline to practice

ccks2021_track3_baseline a baseline to practice 路径可能会有问题,自己改改 torch==1.7.1 pyhton==3.7.1 transformers==4.7.0 cuda==11.0 this is a baseline, you can fi

45 Nov 23, 2022
Final term project for Bayesian Machine Learning Lecture (XAI-623)

Mixquality_AL Final Term Project For Bayesian Machine Learning Lecture (XAI-623) Youtube Link The presentation is given in YoutubeLink Problem Formula

JeongEun Park 3 Jan 18, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

PARE: Part Attention Regressor for 3D Human Body Estimation [ICCV 2021] PARE: Part Attention Regressor for 3D Human Body Estimation, Muhammed Kocabas,

Muhammed Kocabas 277 Jan 03, 2023
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022
Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

LQVAE-separation Code for "Unsupervised Source Separation via Bayesian inference in the latent domain" Paper Samples GT Compressed Separated Drums GT

Michele Mancusi 30 Oct 25, 2022
Author's PyTorch implementation of TD3 for OpenAI gym tasks

Addressing Function Approximation Error in Actor-Critic Methods PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If y

Scott Fujimoto 1.3k Dec 25, 2022
deep_image_prior_extension

Code for "Is Deep Image Prior in Need of a Good Education?" Project page: https://jleuschn.github.io/docs.educated_deep_image_prior/. Supplementary Ma

riccardo barbano 7 Jan 09, 2022
Pytorch implementation of the paper: "A Unified Framework for Separating Superimposed Images", in CVPR 2020.

Deep Adversarial Decomposition PDF | Supp | 1min-DemoVideo Pytorch implementation of the paper: "Deep Adversarial Decomposition: A Unified Framework f

Zhengxia Zou 72 Dec 18, 2022
Masked regression code - Masked Regression

Masked Regression MR - Python Implementation This repositery provides a python implementation of MR (Masked Regression). MR can efficiently synthesize

Arbish Akram 1 Dec 23, 2021