PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

Related tags

Deep Learningpika
Overview

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

PIKA is a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi. The first release focuses on end-to-end speech recognition. We use Pytorch as deep learning engine, Kaldi for data formatting and feature extraction.

Key Features

  • On-the-fly data augmentation and feature extraction loader

  • TDNN Transformer encoder and convolution and transformer based decoder model structure

  • RNNT training and batch decoding

  • RNNT decoding with external Ngram FSTs (on-the-fly rescoring, aka, shallow fusion)

  • RNNT Minimum Bayes Risk (MBR) training

  • LAS forward and backward rescorer for RNNT

  • Efficient BMUF (Block model update filtering) based distributed training

Installation and Dependencies

In general, we recommend Anaconda since it comes with most dependencies. Other major dependencies include,

Pytorch

Please go to https://pytorch.org/ for pytorch installation, codes and scripts should be able to run against pytorch 0.4.0 and above. But we recommend 1.0.0 above for compatibility with RNNT loss module (see below)

Pykaldi and Kaldi

We use Kaldi (https://github.com/kaldi-asr/kaldi)) and PyKaldi (a python wrapper for Kaldi) for data processing, feature extraction and FST manipulations. Please go to Pykaldi website https://github.com/pykaldi/pykaldi for installation and make sure to build Pykaldi with ninja for efficiency. After following the installation process of pykaldi, you should have both Kaldi and Pykaldi dependencies ready.

CUDA-Warp RNN-Transducer

For RNNT loss module, we adopt the pytorch binding at https://github.com/1ytic/warp-rnnt

Others

Check requirements.txt for other dependencies.

Get Started

To get started, check all the training and decoding scripts located in egs directory.

I. Data preparation and RNNT training

egs/train_transducer_bmuf_otfaug.sh contains data preparation and RNNT training. One need to prepare training data and specify the training data directory,

#training data dir must contain wav.scp and label.txt files
#wav.scp: standard kaldi wav.scp file, see https://kaldi-asr.org/doc/data_prep.html 
#label.txt: label text file, the format is, uttid sequence-of-integer, where integer
#           is one-based indexing mapped label, note that zero is reserved for blank,  
#           ,eg., utt_id_1 3 5 7 10 23 
train_data_dir=

II. Continue with MBR training

With RNNT trained model, one can continued MBR training with egs/train_transducer_mbr_bmuf_otfaug.sh (assuming using the same training data, therefore data preparation is omitted). Make sure to specify the initial model,

--verbose \
--optim sgd \
--init_model $exp_dir/init.model \
--rnnt_scale 1.0 \
--sm_scale 0.8 \

III. Training LAS forward and backward rescorer

One can train a forward and backward LAS rescorer for your RNN-T model using egs/train_las_rescorer_bmuf_otfaug.sh. The LAS rescorer will share the encoder part with RNNT model, and has extra two-layer LSTM as additional encoder, make sure to specify the encoder sharing as,

--num_batches_per_epoch 526264 \
--shared_encoder_model $exp_dir/final.model \
--num_epochs 5 \

We support bi-directional LAS rescoring, i.e., forward and backward rescoring. Backward (right-to-left) rescoring is achieved by reversing sequential labels when conducting LAS model training. One can easily perform a backward LAS rescorer training by specifying,

--reverse_labels

IV. Decoding

egs/eval_transducer.sh is the main evluation script, which contains the decoding pipeline. Forward and backward LAS rescoring can be enabled by specifying these two models,

##########configs#############
#rnn transducer model
rnnt_model=
#forward and backward las rescorer model
lasrescorer_fw=
lasrescorer_bw=

Caveats

All the training and decoding hyper-parameters are adopted based on large-scale (e.g., 60khrs) training and internal evaluation data. One might need to re-tune hyper-parameters to acheive optimal performances. Also the WER (CER) scoring script is based on a Mandarin task, we recommend those who work on different languages rewrite scoring scripts.

References

[1] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition, Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu, InterSpeech 2018

[2] Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition, Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu, InterSpeech 2020

Citations

@inproceedings{Weng2020,
  author={Chao Weng and Chengzhu Yu and Jia Cui and Chunlei Zhang and Dong Yu},
  title={{Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={966--970},
  doi={10.21437/Interspeech.2020-1221},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1221}
}

@inproceedings{Weng2018,
  author={Chao Weng and Jia Cui and Guangsen Wang and Jun Wang and Chengzhu Yu and Dan Su and Dong Yu},
  title={Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={761--765},
  doi={10.21437/Interspeech.2018-1030},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1030}
}

Disclaimer

This is not an officially supported Tencent product

Owner
Research repositories.
Pytorch implementation of the unsupervised object discovery method LOST.

LOST Pytorch implementation of the unsupervised object discovery method LOST. More details can be found in the paper: Localizing Objects with Self-Sup

Valeo.ai 189 Dec 25, 2022
When BERT Plays the Lottery, All Tickets Are Winning

When BERT Plays the Lottery, All Tickets Are Winning Large Transformer-based models were shown to be reducible to a smaller number of self-attention h

Sai 16 Nov 10, 2022
Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

Cognitive Tools Lab 38 Jan 06, 2023
Fairness Metrics: All you need to know

Fairness Metrics: All you need to know Testing machine learning software for ethical bias has become a pressing current concern. Recent research has p

Anonymous2020 1 Jan 17, 2022
PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

D2C: Diffuison-Decoding Models for Few-shot Conditional Generation Project | Paper PyTorch implementation of D2C: Diffuison-Decoding Models for Few-sh

Jiaming Song 90 Dec 27, 2022
Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your machine! Motivation Would

Joeri Hermans 15 Sep 11, 2022
Architecture Patterns with Python (TDD, DDD, EDM)

architecture-traning Architecture Patterns with Python (TDD, DDD, EDM) Chapter 5. 높은 기어비와 낮은 기어비의 TDD 5.2 도메인 계층 테스트를 서비스 계층으로 옮겨야 하는가? 도메인 계층 테스트 def

minsung sim 2 Mar 04, 2022
An implementation of the efficient attention module.

Efficient Attention An implementation of the efficient attention module. Description Efficient attention is an attention mechanism that substantially

Shen Zhuoran 194 Dec 15, 2022
A 3D sparse LBM solver implemented using Taichi

taichi_LBM3D Background Taichi_LBM3D is a 3D lattice Boltzmann solver with Multi-Relaxation-Time collision scheme and sparse storage structure impleme

Jianhui Yang 121 Jan 06, 2023
My course projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU)

ML2021Spring There are my projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU) Course Web : https://speech.ee.

Ding-Li Chen 15 Aug 29, 2022
Adversarial Attacks are Reversible via Natural Supervision

Adversarial Attacks are Reversible via Natural Supervision ICCV2021 Citation @InProceedings{Mao_2021_ICCV, author = {Mao, Chengzhi and Chiquier

Computer Vision Lab at Columbia University 20 May 22, 2022
A program that can analyze videos according to the weights you select

MaskMonitor A program that can analyze videos according to the weights you select 下載 訓練完的 weight檔案 執行 MaskDetection.py 內部可更改 輸入來源(鏡頭, 影片, 圖片) 以及輸出條件(人

Patrick_star 1 Nov 07, 2021
Analyses of the individual electric field magnitudes with Roast.

Aloi Davide - PhD Student (UoB) Analysis of electric field magnitudes (wp2a dataset only at the moment) and correlation analysis with Dynamic Causal M

Davide Aloi 7 Dec 15, 2022
Predict the latency time of the deep learning models

Deep Neural Network Prediction Step 1. Genernate random parameters and Run them sequentially : $ python3 collect_data.py -gp -ep -pp -pl pooling -num

QAQ 1 Nov 12, 2021
Using CNN to mimic the driver based on training data from Torcs

Behavioural-Cloning-in-autonomous-driving Using CNN to mimic the driver based on training data from Torcs. Approach First, the data was collected from

Sudharshan 2 Jan 05, 2022
Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition This is a Torch implementation of "Deep Residual Learning for Image Recognition",Kaiming He, Xiangyu Zhan

Kimmy 561 Dec 01, 2022
DGL-TreeSearch and the Gurobi-MWIS interface

Independent Set Benchmarking Suite This repository contains the code for our maximum independent set benchmarking suite as well as our implementations

Maximilian Böther 19 Nov 22, 2022
Machine Learning Toolkit for Kubernetes

Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment. Documentation Please refer to the official do

Kubeflow 12.1k Jan 03, 2023
Code for "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS 2021

ATISS: Autoregressive Transformers for Indoor Scene Synthesis This repository contains the code that accompanies our paper ATISS: Autoregressive Trans

138 Dec 22, 2022
Neural implicit reconstruction experiments for the Vector Neuron paper

Neural Implicit Reconstruction with Vector Neurons This repository contains code for the neural implicit reconstruction experiments in the paper Vecto

Congyue Deng 35 Jan 02, 2023