Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Last update: Dec 29, 2022

Overview

Wav2Vec2 STT Python

Beta Software

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

Requirements:

Python 3.7+
Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
Python package requirements: cffi, numpy
Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.

Models:

Model	Download Size
Facebook Wav2Vec2 2.0 Base (960h)	360 MB
Facebook Wav2Vec2 2.0 Large (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 Self (960h)	1.18 GB

Usage

from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')

import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())

assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'

Also contains a simple CLI interface for recognizing wav files:

$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...

positional arguments:
  {decode}    sub-command
    decode    decode one or more WAV files

optional arguments:
  -h, --help  show this help message and exit

Installation/Building

Recommended installation via wheel from pip (requires a recent version of pip):

python -m pip install wav2vec2_stt

See setup.py for more details on building it yourself.

Author

David Zurow (@daanzu)

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.

Acknowledgments

Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.

Comments

provide API for returning output from intermediate layers

It would be very helpful to have an API for returning output from intermediate layers, for example, the one before the final layers. This output can be used in other speech tasks other than speech recognition.

opened by zhouyong64 1

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

13.6k Jan 5, 2023

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

TGCLOUD 🪁 Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁 Features Easy to Deploy Heroku Supp

6 Oct 18, 2022

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

64 May 8, 2022

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

40 Sep 19, 2022

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 9, 2021

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Related tags

Overview

Wav2Vec2 STT Python

Usage

Installation/Building

Author

License

Acknowledgments

You might also like...

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Comments

provide API for returning output from intermediate layers

Releases(v0.2.0)

v0.2.0(Aug 16, 2021)

models(Aug 2, 2021)

Owner

David Zurow

HAN2HAN : Hangul Font Generation

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Ray-based parallel data preprocessing for NLP and ML.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Open-source offline translation library written in Python. Uses OpenNMT for translations

Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

LeBenchmark: a reproducible framework for assessing SSL from speech

novel deep learning research works with PaddlePaddle

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Every Google, Azure & IBM text to speech voice for free

A Python/Pytorch app for easily synthesising human voices

pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

I can help you convert your images to pdf file.

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Knowledge Management for Humans using Machine Learning & Tags

The Classical Language Toolkit

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs