Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep.

Text completion with Hugging Face and TensorFlow.js running on Node.js

TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Kaggle: Cell Instance Segmentation

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

A curated list of long-tailed recognition resources.

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".

Pre-Training 3D Point Cloud Transformers with Masked Point Modeling

Official implementation of Few-Shot and Continual Learning with Attentive Independent Mechanisms

ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

TinyML Cookbook, published by Packt

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

Causal Imitative Model for Autonomous Driving