Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

A generator of point clouds dataset for PyPipes.

A Graph Neural Network Tool for Recovering Dense Sub-graphs in Random Dense Graphs.

QKeras: a quantization deep learning library for Tensorflow Keras

This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Single-Stage 6D Object Pose Estimation, CVPR 2020

Mmdet benchmark with python

Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

A crash course in six episodes for software developers who want to become machine learning practitioners.

A Vision Transformer approach that uses concatenated query and reference images to learn the relationship between query and reference images directly.

Introducing neural networks to predict stock prices

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

Code Repository for Liquid Time-Constant Networks (LTCs)

Non-stationary GP package written from scratch in PyTorch

Affine / perspective transformation in Pose Estimation with Tensorflow 2

Yolact-keras实例分割模型在keras当中的实现

Sdf sparse conv - Deep Learning on SDF for Classifying Brain Biomarkers

U-Time: A Fully Convolutional Network for Time Series Segmentation

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"