A PyTorch Implementation of End-to-End Models for Speech-to-Text

Last update: Dec 25, 2022

Related tags

Overview

speech

Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Connectionist Temporal Classification and the RNN Sequence Transducer are currently supported.

The goal of this software is to facilitate research in end-to-end models for speech recognition. The models are implemented in PyTorch.

The software has only been tested in Python3.6.

We will not be providing backward compatability for Python2.7.

Install

We recommend creating a virtual environment and installing the python requirements there.

virtualenv <path_to_your_env>
source <path_to_your_env>/bin/activate
pip install -r requirements.txt

Then follow the installation instructions for a version of PyTorch which works for your machine.

After all the python requirements are installed, from the top level directory, run:

make

The build process requires CMake as well as Make.

After that, source the setup.sh from the repo root.

source setup.sh

Consider adding this to your bashrc.

You can verify the install was successful by running the tests from the tests directory.

cd tests
pytest

Run

To train a model run

python train.py <path_to_config>

After the model is done training you can evaluate it with

python eval.py <path_to_model> <path_to_data_json>

To see the available options for each script use -h:

python {train, eval}.py -h

Examples

For examples of model configurations and datasets, visit the examples directory. Each example dataset should have instructions and/or scripts for downloading and preparing the data. There should also be one or more model configurations available. The results for each configuration will documented in each examples corresponding README.md.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Related tags

Overview

speech

Install

Run

Examples

Owner

Awni Hannun

Topic Inference with Zeroshot models

Subtitle Workshop (subshop): tools to download and synchronize subtitles

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

NLP library designed for reproducible experimentation management

Blue Brain text mining toolbox for semantic search and structured information extraction

Creating a chess engine using GPT-3

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

Fully featured implementation of Routing Transformer

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Material for GW4SHM workshop, 16/03/2022.

A programming language with logic of Python, and syntax of all languages.

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

Mlcode - Continuous ML API Integrations

Opal-lang - A WIP programming language based on Python

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x