wav2vec_finetune

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Initial test: gender recognition on this dataset.
Finetune for autism detection
[] Clean up directory
[] Make training and evaluation scripts runnable with cmd line / shell scripts
[] Add random noise on training samples
[] Make baseline models

# make virtual env
pip install -r requirements.txt

mkdir data
mkdir preproc_data
mkdir model
cd data
wget https://zenodo.org/record/1219621/files/CaFE_48k.zip?download=1
unzip the file 

python preproc.py
python train.py
python evaluate.py

Updates

11/9: success! Trained a sex classifier on a small dataset that performs soso. Everything seems to work though.

TODO

Chunk audio files - make predictions in batches of e.g. 5 seconds
Set up benchmark models

Resources:

https://github.com/pytorch/fairseq/blob/master/examples/xlmr/README.md
https://arxiv.org/abs/2006.13979
https://huggingface.co/transformers/training.html
https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
https://discuss.huggingface.co/t/german-asr-fine-tuning-wav2vec2/4558/5
https://huggingface.co/docs/datasets/loading_datasets.html#from-local-files
https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md
https://github.com/m3hrdadfi/soxan
https://www.zhaw.ch/storage/engineering/institute-zentren/cai/BA21_Speech_Classification_Reiser_Fivian.pdf
https://github.com/DReiser7/w2v_did
https://github.com/ARBML/klaam
https://github.com/talhanai/speech-nlp-datasets

Notes:

Look into SpecAugment for finetuning: https://arxiv.org/abs/1904.08779 (on by default)
How to make the prediction?
- Better way than a small feedforward projection? (LSTM or something?)

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Related tags

Overview

wav2vec_finetune

Updates

TODO

Resources:

Notes:

Owner

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

Shirt Bot is a discord bot which uses GPT-3 to generate text

Python package for performing Entity and Text Matching using Deep Learning.

Python functions for summarizing and improving voice dictation input.

Beyond the Imitation Game collaborative benchmark for enormous language models

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Ukrainian TTS (text-to-speech) using Coqui TTS

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Multi Task Vision and Language

An easier way to build neural search on the cloud

An open-source NLP library: fast text cleaning and preprocessing.

A tool helps build a talk preview image by combining the given background image and talk event description

A simple version of DeTR

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Command Line Text-To-Speech using Google TTS

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

IMDB film review sentiment classification based on BERT's supervised learning model.

100+ Chinese Word Vectors 上百种预训练中文词向量

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities