A minimal Conformer ASR implementation adapted from ESPnet.

Last update: Jan 24, 2022

Related tags

Overview

Conformer ASR

A minimal Conformer ASR implementation adapted from ESPnet.

Introduction

I want to use the pre-trained English ASR model provided by ESPnet. However, ESPnet is relatively heavy for me. So here I try to extract only the conformer ASR part from ESPnet so that I can do better customization. Let's do it.

There are bunch of models available for ASR listed here. I choose the one with name:

kamo-naoyuki/librispeech_asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe5000_scheduler_confwarmup_steps40000_optim_conflr0.0025_sp_valid.acc.ave

Its performance can be found [here](https://zenodo.org/record/4604066#.YbxsX5FByV4), toggle me to see.

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	54402	97.9	1.9	0.2	0.2	2.3	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	50948	94.5	5.1	0.5	0.6	6.1	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	52576	97.7	2.1	0.2	0.3	2.6	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	52343	94.7	4.9	0.5	0.7	6.0	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	54402	98.3	1.5	0.2	0.2	1.9	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	50948	95.8	3.7	0.4	0.5	4.6	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	52576	98.1	1.7	0.2	0.3	2.1	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	52343	95.8	3.7	0.5	0.5	4.7	42.4

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	288456	99.4	0.3	0.2	0.2	0.8	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	265951	98.0	1.2	0.8	0.7	2.7	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	281530	99.4	0.3	0.3	0.3	0.9	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	272758	98.2	1.0	0.7	0.7	2.5	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	288456	99.5	0.3	0.2	0.2	0.7	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	265951	98.3	1.0	0.7	0.5	2.2	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	281530	99.5	0.3	0.3	0.2	0.7	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	272758	98.5	0.8	0.7	0.5	2.1	42.4

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	68010	97.5	1.9	0.7	0.4	2.9	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	63110	93.4	5.0	1.6	1.0	7.6	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	65818	97.2	2.0	0.8	0.4	3.3	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	65101	93.7	4.5	1.8	0.9	7.2	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	68010	97.8	1.5	0.7	0.3	2.5	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	63110	94.6	3.8	1.6	0.7	6.1	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	65818	97.6	1.6	0.8	0.3	2.7	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	65101	94.7	3.5	1.8	0.7	6.0	42.4

ASR step by step

1. Setup code

pip install .

2. Download the model and unzip it

wget https://zenodo.org/record/4604066/files/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe5000_scheduler_confwarmup_steps40000_optim_conflr0.0025_sp_valid.acc.ave.zip?download=1 -o conformer.zip
unzip conformer.zip

3. Run an example

import torch
import librosa
from mmds.utils.spectrogram import MelSpectrogram
from conformer_asr import Conformer, Tokenizer

sample_rate = 16000
cfg_path = "./exp_unnorm/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000/config.yaml"
bpe_path = "./data/en_unnorm_token_list/bpe_unigram5000/bpe.model"
ckpt_path = "./exp_unnorm/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000/valid.acc.ave_10best.pth"

tokenizer = Tokenizer(cfg_path, bpe_path)
conformer = Conformer(tokenizer, ckpt_path=ckpt_path)
conformer.eval()

spec_fn = MelSpectrogram(
    sample_rate,
    hop_length=256,
    f_min=0,
    f_max=8000,
    win_length=512,
    power=2,
)

w0, _ = librosa.load("./example.m4a", sample_rate)
w0 = torch.from_numpy(w0)
m0 = spec_fn(w0).t()

l = len(m0)

# create batch with different length audio (yes, supported)
x = [m0, m0[: l // 2], m0[: l // 4]]

ref = "This is a test video for youtube-dl. For more information, contact [email protected]".lower()
hyps = conformer.decode(x, beam_width=20)

print("REF", ref)
for hyp in hyps:
    print("HYP", hyp.lower())

Results

REF this is a test video for youtube-dl. for more information, contact [email protected]
HYP this is a test video for you do bl for more information -- contact the hih aging at the hihaging, not the
HYP this is a test for you d bl for more information
HYP this is a testim for you to

A minimal Conformer ASR implementation adapted from ESPnet.

Related tags

Overview

Conformer ASR

Introduction

ASR step by step

1. Setup code

2. Download the model and unzip it

3. Run an example

Features

Supported

Not supported yet

Owner

Niu Zhe

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

Text classification on IMDB dataset using Keras and Bi-LSTM network

Python package for performing Entity and Text Matching using Deep Learning.

PyTorch impelementations of BERT-based Spelling Error Correction Models.

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

CYGNUS, the Cynical AI, combines snarky responses with uncanny aggression.

Opal-lang - A WIP programming language based on Python

A Chinese to English Neural Model Translation Project

This is a simple item2vec implementation using gensim for recbole

The RWKV Language Model

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

ByT5: Towards a token-free future with pre-trained byte-to-byte models

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

lightweight, fast and robust columnar dataframe for data analytics with online update

Awesome-NLP-Research (ANLP)

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

2021搜狐校园文本匹配算法大赛baseline

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Unsupervised intent recognition