Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Related tags

Deep Learningsew
Overview

SEW (Squeezed and Efficient Wav2vec)

made-with-python License: MIT

The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition" by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q Weinberger, and Yoav Artzi.

Model Checkpoints

Unsupervisedly Pre-trained on LibriSpeech 960h

Model Pre-training updates Dataset Model
W2V2-tiny 100K Librispeech 960h download
W2V2-small 100K Librispeech 960h download
W2V2-mid 100K Librispeech 960h download
W2V2-base 100K Librispeech 960h download
SEW-tiny 100K Librispeech 960h download
SEW-small 100K Librispeech 960h download
SEW-mid 100K Librispeech 960h download
SEW-D-tiny 100K Librispeech 960h download
SEW-D-small 100K Librispeech 960h download
SEW-D-mid 100K Librispeech 960h download
SEW-D-mid (k127) 100K Librispeech 960h download
SEW-D-base 100K Librispeech 960h download
SEW-D-base+ 100K Librispeech 960h download
SEW-D-mid 400K Librispeech 960h download
SEW-D-mid (k127) 400K Librispeech 960h download
SEW-D-base+ 400K Librispeech 960h download

Usage

Dependencies

The code is tested with fairseq commit 05255f9, deberta commit bf17ca4 and the following packages.

torch==1.8.0
torchaudio==0.8.0
tqdm==4.49.0
Hydra==2.5
hydra-core==1.0.4
fvcore==0.1.5.post20210330
omegaconf==2.0.5
einops==0.3.0
fire==0.2.1

Apex

Please install NVIDIA's apex with

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

wav2letter decoder

Currently, we are decoding with wav2letter v0.2 python binding at commit 96f5f9d Please install the python binding here https://github.com/flashlight/wav2letter/tree/96f5f9d3b41e01af0a031ee0d2604acd9ef3b1b0/bindings/python The newest commit d5a93f0 in v0.2 branch leads to worse WER for wav2vec 2.0 baselines.

Installation

git clone https://github.com/asappresearch/sew.git
cd sew 
pip install -e .

Pre-training

Pre-training SEW models

Run the following command where $model_size can be tiny, small, or mid, and $ngpu is tne number of GPUs you want to use.

bash scripts/pt-sew.sh $model_size $ngpu

Pre-training SEW-D models

bash scripts/pt-sew-d.sh $model_size $ngpu

where $model_size can be tiny, small, mid, mid-k127, base, or base+.

Fine-tuning

Run the following script to fine-tune a model with the hyperparameters from wav2vec 2.0.

bash scripts/ft-model.sh $pre_trained_model $split $ngpu

where $pre_trained_model can be either a W2V2, SEW, or a SEW-D model checkpoint and $split can be 10m, 1h, 10h, or 100h.

Here we also provide a set of hyperparameters which sets all dropouts the same as the pre-training stage, and we found it to be more stable.

bash scripts/ft-model-stable.sh $pre_trained_model $split $ngpu

If you see out of GPU memory error, please scale down the dataset.max_tokens and scale up the optimization.update_freq in scripts/ft-model.sh. For example modifying these lines

  dataset.max_tokens=3200000 \
  optimization.update_freq="[$((8 / $ngpu))]" \

to

  dataset.max_tokens=1600000 \
  optimization.update_freq="[$((16 / $ngpu))]" \

which reduces the batch size and increases the gradient accumulation steps in order to use less GPU memory.

Evaluation

  1. Please run this script to prepare the official LibriSpeech 4-gram language model.
bash scripts/prepare_librispeech_lm.sh $kenlm_build_bin

where $kenlm_build_bin is the folder that contains the KenLM build_binary executable file (e.g. /home/user/kenlm/build/bin).

  1. Then run this script to evaluate a pre-trained ASR model
python tools/eval_w2v.py tunelm --subsets '["dev-clean", "dev-other", "test-clean", "test-other"]' --model $asr_checkpoint
You might also like...
Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

Code for our CVPR 2021 paper
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

Comments
  • 8000 sample rate audio

    8000 sample rate audio

    Hello there,

    I'm trying to train on 8000 Hz sample rate audio dataset. Is it enough to simply add task.sample_rate=8000 to the fairseq command or there are additional config changes that I should make?

    I would much appreciate any advice

    Thank you

    opened by Mega4alik 0
  • How to train using not English Languages

    How to train using not English Languages

    Hi! Thank you for the awesome model!

    We are very interested in your project and we try to use the sew for Japanese Language. When we train the model, should we use these scripts? Thanks! https://github.com/asappresearch/sew/tree/master/scripts

    opened by jigenji 1
  • :bug: Fix padding mask calculation

    :bug: Fix padding mask calculation

    This PR updates the padding mask calculation to be the same as the one in the reference Wav2Vec2 implementation (same commit as listed in SEW's README): https://github.com/pytorch/fairseq/blob/05255f96410e5b1eaf3bf59b767d5b4b7e2c3a35/fairseq/models/wav2vec/wav2vec2.py#L477

    For more details on how and why it was fixed in fairseq, check out this PR by @patrickvonplaten https://github.com/pytorch/fairseq/pull/3228

    opened by anton-l 0
Releases(v0.0.1)
Owner
ASAPP Research
AI for Enterprise
ASAPP Research
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

NBFNet: Neural Bellman-Ford Networks This is the official codebase of the paper Neural Bellman-Ford Networks: A General Graph Neural Network Framework

MilaGraph 136 Dec 21, 2022
Data augmentation for NLP, accepted at EMNLP 2021 Findings

AEDA: An Easier Data Augmentation Technique for Text Classification This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Techni

Akbar Karimi 81 Dec 09, 2022
classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022
TensorFlow-LiveLessons - "Deep Learning with TensorFlow" LiveLessons

TensorFlow-LiveLessons Note that the second edition of this video series is now available here. The second edition contains all of the content from th

Deep Learning Study Group 830 Jan 03, 2023
Deep-Learning-Book-Chapter-Summaries - Attempting to make the Deep Learning Book easier to understand.

Deep-Learning-Book-Chapter-Summaries This repository provides a summary for each chapter of the Deep Learning book by Ian Goodfellow, Yoshua Bengio an

Aman Dalmia 1k Dec 27, 2022
Simple and understandable swin-transformer OCR project

swin-transformer-ocr ocr with swin-transformer Overview Simple and understandable swin-transformer OCR project. The model in this repository heavily r

Ha YongWook 67 Dec 31, 2022
Code for GNMR in ICDE 2021

GNMR Code for GNMR in ICDE 2021 Please unzip data files in Datasets/MultiInt-ML10M first. Run labcode_preSamp.py (with graph sampling) for ECommerce-c

7 Oct 27, 2022
Streamlit component for TensorBoard, TensorFlow's visualization toolkit

streamlit-tensorboard This is a work-in-progress, providing a function to embed TensorBoard, TensorFlow's visualization toolkit, in Streamlit apps. In

Snehan Kekre 27 Nov 13, 2022
A basic neural network for image segmentation.

Unet_erythema_detection A basic neural network for image segmentation. 前期准备 1.在logs文件夹中下载h5权重文件,百度网盘链接在logs文件夹中 2.将所有原图 放置在“/dataset_1/JPEGImages/”文件夹

1 Jan 16, 2022
Exadel CompreFace is a free and open-source face recognition GitHub project

Exadel CompreFace is a leading free and open-source face recognition system Exadel CompreFace is a free and open-source face recognition service that

Exadel 2.6k Jan 04, 2023
CCP dataset from Clothing Co-Parsing by Joint Image Segmentation and Labeling

Clothing Co-Parsing (CCP) Dataset Clothing Co-Parsing (CCP) dataset is a new clothing database including elaborately annotated clothing items. 2, 098

Wei Yang 434 Dec 24, 2022
Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

MetaSDF: Meta-learning Signed Distance Functions Project Page | Paper | Data Vincent Sitzmann*, Eric Ryan Chan*, Richard Tucker, Noah Snavely Gordon W

Vincent Sitzmann 100 Jan 01, 2023
A PaddlePaddle implementation of Time Interval Aware Self-Attentive Sequential Recommendation.

TiSASRec.paddle A PaddlePaddle implementation of Time Interval Aware Self-Attentive Sequential Recommendation. Introduction 论文:Time Interval Aware Sel

Paddorch 2 Nov 28, 2021
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 = TF = 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022
Springer Link Download Module for Python

♞ pupalink A simple Python module to search and download books from SpringerLink. 🧪 This project is still in an early stage of development. Expect br

Pupa Corp. 18 Nov 21, 2022
Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

680 Final Project: BicycleGAN Haoran Tang Instructions 1. Training To train the network, please run train.py. Change hyper-parameters and folder paths

Haoran Tang 0 Apr 22, 2022
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022
BrainGNN - A deep learning model for data-driven discovery of functional connectivity

A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou

Usman Mahmood 3 Aug 28, 2022
Convolutional Neural Networks

Darknet Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation. D

Joseph Redmon 23.7k Jan 05, 2023
An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

Semisupervised Multitask Learning This repository is an unofficial and slightly modified implementation of UM-Adapt[1] using PyTorch. This code primar

Abhinav Atrishi 11 Nov 25, 2022