Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

Last update: Jul 21, 2022

Overview

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode

Introduction

This repo shows how 🤗 Transformers can be used in combination with Parlance's ctcdecode & KenLM ngram as a simple way to boost word error rate (WER).

Included is a file to create an ngram with KenLM as well as a simple evaluation script to compare the results of using Wav2Vec2 with ctcdecode + KenLM vs. without using any language model.

Note: The scripts are written to be used on GPU. If you want to use a CPU instead, simply remove all .to("cuda") occurances in eval.py.

Installation

In a first step, one should install KenLM. For Ubuntu, it should be enough to follow the installation steps described here. The installed kenlm folder should be move into this repo for ./create_ngram.py to function correctly. Alternatively, one can also link the lmplz binary file to a lmplz bash command to directly run lmplz instead of ./kenlm/build/bin/lmplz.

Next, some Python dependencies should be installed. Assuming PyTorch is installed, it should be sufficient to run pip install -r requirements.txt.

Run evaluation

Create ngram

In a first step on should create a ngram. E.g. for polish the command would be:

./create_ngram.py --language polish --path_to_ngram polish.arpa

After the language model is created, one should open the file. one should add a The file should have a structure which looks more or less as follows:

\data\        
ngram 1=86586
ngram 2=546387
ngram 3=796581           
ngram 4=843999             
ngram 5=850874              
                                                  
\1-grams:
-5.7532206      
   
       0
0       
         -0.06677356                                                                            
-3.4645514      drugi   -0.2088903
...

~~Now it is very important also add a~~ token to the n-gram so that it can be correctly loaded. You can simple copy the line:

0 -0.06677356

and change to . When doing this you should also inclease ngram by 1. The new ngram should look as follows:

\data\ ngram 1=86587 ngram 2=546387 ngram 3=796581 ngram 4=843999 ngram 5=850874 \1-grams: -5.7532206 0 0 -0.06677356 0 -0.06677356 -3.4645514 drugi -0.2088903 ...

Now the ngram can be correctly used with pyctcdecode

Run eval

Having created the ngram, one can run:

./eval.py --language polish --path_to_ngram polish.arpa

To compare Wav2Vec2 + LM vs. Wav2Vec2 + No LM on polish.

Results

==================================================polish================================================== polish - No LM - | WER: 0.3069742867206763 | CER: 0.06054530156286364 | Time: 32.37423086166382 polish - With LM - | WER: 0.39526828695550076 | CER: 0.17596985266474516 | Time: 62.017329692840576

I didn't obtain any good results even when trying out a variety of different settings for alpha and beta. Sadly there aren't many examples, tutorials or docs on parlance/ctcdecode so it's hard to find the reason for the problem.

Also tried it out for other languages like Portuguese and Spanish, but no luck there either.

Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

Related tags

Overview

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode

Introduction

Installation

Run evaluation

Create ngram

Run eval

Results

Owner

Patrick von Platen

Graphical user interface for Argos Translate

This repository structures data in title, summary, tags, sentiment given a fragment of a conversation

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

NLP made easy

100+ Chinese Word Vectors 上百种预训练中文词向量

Shared code for training sentence embeddings with Flax / JAX

leaking paid token generator that was a shit lmao for 100$ haha

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

This repository contains examples of Task-Informed Meta-Learning

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Collection of useful (to me) python scripts for interacting with napari

Easy, fast, effective, and automatic g-code compression!

Natural Language Processing

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

Open Source Neural Machine Translation in PyTorch

GSoC'2021 | TensorFlow implementation of Wav2Vec2

NLP Core Library and Model Zoo based on PaddlePaddle 2.0