Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

VM3000 Microphones

Small little script to scrape, parse and check for active tor nodes. Can be used as proxies.

Consumer Fairness in Recommender Systems: Contextualizing Definitions and Mitigations

A keras-based real-time model for medical image segmentation (CFPNet-M)

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

AAAI 2022: Stationary diffusion state neural estimation

Python library for science observations from the James Webb Space Telescope

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

Automatically replace ONNX's RandomNormal node with Constant node.

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Parallel Latent Tree-Induction for Faster Sequence Encoding

Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

Auto-updating data to assist in investment to NEPSE