Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Last update: Jan 03, 2023

Overview

Espresso

Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.

We provide state-of-the-art training recipes for the following speech datasets:

What's New:

April 2021: On-the-fly feature extraction from raw waveforms with torchaudio is supported. A LibriSpeech recipe is released here with no dependency on Kaldi and using YAML files (via Hydra) for configuring experiments.
June 2020: Transformer recipes released.
April 2020: Both E2E LF-MMI (using PyChain) and Cross-Entropy training for hybrid ASR are now supported. WSJ recipes are provided here and here as examples, respectively.
March 2020: SpecAugment is supported and relevant recipes are released.
September 2019: We are in an effort of isolating Espresso from fairseq, resulting in a standalone package that can be directly pip installed.

Requirements and Installation

PyTorch version >= 1.5.0
Python version >= 3.6
For training new models, you'll also need an NVIDIA GPU and NCCL
To install Espresso from source and develop locally:

git clone https://github.com/freewym/espresso
cd espresso
pip install --editable .

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
pip install kaldi_io sentencepiece soundfile
cd espresso/tools; make KALDI=<path/to/a/compiled/kaldi/directory>

add your Python path to PATH variable in examples/asr_<dataset>/path.sh, the current default is ~/anaconda3/bin.

kaldi_io is required for reading kaldi scp files. sentencepiece is required for subword pieces training/encoding. soundfile is required for reading raw waveform files. Kaldi is required for data preparation, feature extraction, scoring for some datasets (e.g., Switchboard), and decoding for all hybrid systems.

If you want to use PyChain for LF-MMI training, you also need to install PyChain (and OpenFst):

edit PYTHON_DIR variable in espresso/tools/Makefile (default: ~/anaconda3/bin), and then

cd espresso/tools; make openfst pychain

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

License

Espresso is MIT-licensed.

Citation

Please cite Espresso as:

@inproceedings{wang2019espresso,
  title = {Espresso: A Fast End-to-end Neural Speech Recognition Toolkit},
  author = {Yiming Wang and Tongfei Chen and Hainan Xu 
            and Shuoyang Ding and Hang Lv and Yiwen Shao 
            and Nanyun Peng and Lei Xie and Shinji Watanabe 
            and Sanjeev Khudanpur},
  booktitle = {2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2019},
}

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Related tags

Overview

Espresso

What's New:

Requirements and Installation

License

Citation

Owner

Yiming Wang

Transformer training code for sequential tasks

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Subtitle Workshop (subshop): tools to download and synchronize subtitles

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

A Chinese to English Neural Model Translation Project

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Indonesia spellchecker with python

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Code for evaluating Japanese pretrained models provided by NTT Ltd.

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context