Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Overview

Auditory Slow-Fast

This repository implements the model proposed in the paper:

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021

Project's webpage

arXiv paper

Citing

When using this code, kindly reference:

@ARTICLE{Kazakos2021SlowFastAuditory,
   title={Slow-Fast Auditory Streams For Audio Recognition},
   author={Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
           journal   = {CoRR},
           volume    = {abs/2103.03516},
           year      = {2021},
           ee        = {https://arxiv.org/abs/2103.03516},
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-KITCHENS-100:

  • Slow-Fast (EPIC-KITCHENS-100) link
  • Slow (EPIC-KITCHENS-100) link
  • Fast (EPIC-KITCHENS-100) link
  • Slow-Fast (VGG-Sound) link
  • Slow (VGG-Sound) link
  • Fast (VGG-Sound) link

Preparation

  • Requirements:
    • PyTorch 1.7.1
    • librosa: conda install -c conda-forge librosa
    • h5py: conda install h5py
    • wandb: pip install wandb
    • fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
    • simplejson: pip install simplejson
    • psutil: pip install psutil
    • tensorboard: pip install tensorboard
  • Add this repository to $PYTHONPATH.
export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH
  • VGG-Sound:
    1. Download the audio. For instructions see here
    2. Download train.pkl (link) and test.pkl (link). I converted the original train.csv and test.csv (found here) to pickle files with column names for easier use
  • EPIC-KITCHENS:
    1. From the annotation repository of EPIC-KITCHENS-100 (link), download: EPIC_100_train.pkl, EPIC_100_validation.pkl, and EPIC_100_test_timestamps.pkl. EPIC_100_train.pkl and EPIC_100_validation.pkl will be used for training/validation, while EPIC_100_test_timestamps.pkl can be used to obtain the scores to submit in the AR challenge.
    2. Download all the videos of EPIC-KITCHENS-100 using the download scripts found here, where you can also find detailed instructions on using the scripts.
    3. Extract audio from the videos by running:
    python audio_extraction/extract_audio.py /path/to/videos /output/path 
    
    1. Save audio in HDF5 format by running:
    python audio_extraction/wav_to_hdf5.py /path/to/audio /output/hdf5/EPIC-KITCHENS-100_audio.hdf5
    

Training/validation on EPIC-KITCHENS-100

To train the model run (fine-tuning from VGG-Sound pretrained model):

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To train from scratch remove TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model.

You can also train the individual streams. For example, for training Slow run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOW_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

To obtain scores on the test set run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth 
EPICKITCHENS.TEST_LIST EPIC_100_test_timestamps.pkl EPICKITCHENS.TEST_SPLIT test

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations 

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

Owner
Evangelos Kazakos
Evangelos Kazakos
A2DP agent for promiscuous/permissive audio sinc.

Promiscuous Bluetooth audio sinc A2DP agent for promiscuous/permissive audio sinc for Linux. Once installed, a Bluetooth client, such as a smart phone

Jasper Aorangi 4 May 27, 2022
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

4 Dec 21, 2022
Simple discord bot by @merive 🤖

Parzibot Powerful and Useful Discord Bot on Python. The source code of the bot is available to everyone. Parzibot uses English language. This is free

merive_ 3 Dec 28, 2022
MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

MetaBrainz Foundation 3k Dec 31, 2022
live coding in python + supercollider

live coding in python + supercollider

Zack 6 Feb 06, 2022
Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5

Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5. It gives you the ability to play, pause, and Equalize any one-channel wav audio file and play 3 different instruments.

Mustafa Megahed 1 Jan 10, 2022
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Meinard Mueller 66 Jan 02, 2023
Voice package for Pycord adding extra features.

VoiceIO Voice package for Pycord adding extra features. Example Down bellow is an example of what you can currently do. import voiceio process = voic

pycord 1 Dec 24, 2021
A python program for visualizing MIDI files, and displaying them in a spiral layout

SpiralMusic_python A python program for visualizing MIDI files, and displaying them in a spiral layout For a hardware version using Teensy & LED displ

Gavin 6 Nov 23, 2022
Muzic: Music Understanding and Generation with Artificial Intelligence

Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence.

Microsoft 2.6k Dec 30, 2022
This bot can stream audio or video files and urls in telegram voice chats

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) 🎯 Follow me and star this repo for more telegram bot

WiskeyWorm 4 Oct 09, 2022
pyo is a Python module written in C to help digital signal processing script creation.

pyo is a Python module written in C to help digital signal processing script creation.

Olivier Bélanger 1.1k Jan 01, 2023
This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

VcPlayer This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜

1 Dec 20, 2021
Play any song directly into your group voice chat.

Telegram VCPlayer Bot Play any song directly into your group voice chat. Official Bot : VCPlayerBot | Discussion Group : VoiceChat Music Player Suppor

Shubham Kumar 50 Nov 21, 2022
The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases.

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases. The cross-section of the throat is less than the cross-section of the inlet pi

Shankar Mahadevan L 1 Dec 03, 2021
Some utils for auto speech recognition

About Some utils for auto speech recognition. Utils Util Description Script Reset audio Reset sample rate, sample width, etc of audios.

1 Jan 24, 2022
Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Quod Libet 1.1k Dec 31, 2022
Music bot of # Owner

Pokimane-Music Music bot of # Owner How To Host The easiest way to deploy this Bot Support Channel :- TeamDlt Support Group :- TeamDlt Please fork thi

5 Dec 23, 2022
Make an audio file (really) long-winded

longwind Make an audio file (really) long-winded Daily repetitions are an illusion anyway.

Vincent Lostanlen 2 Sep 12, 2022
Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Pitcher.py Free & OS emulation of the SP-12 & SP-1200 signal chain (now with GUI) Pitch shift / bitcrush / resample audio files Written and tested in

morgan 13 Oct 03, 2022