Arabic speech recognition, classification and text-to-speech.

Last update: Dec 27, 2022

Overview

klaam

Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows training and prediction using pretrained models.

Usage

from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)

from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)

from klaam import TextToSpeech
model = TextToSpeech()
model.synthesize(sample_text)

There are two avilable models for recognition trageting MSA and egyptian dialect . You can set any of them using the lang attribute

 from klaam import SpeechRecognition
 model = SpeechRecognition(lang = 'msa')
 model.transcribe('file.wav')

Datasets

Dataset	Description	link
MGB-3	Egyptian Arabic Speech recognition in the wild. Every sentence was annotated by four annotators. More than 15 hours have been collected from YouTube.	requires registeration here
ADI-5	More than 50 hours collected from Aljazeera TV. 4 regional dialectal: Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). This dataset is a part of the MGB-3 challenge.	requires registeration here
Common voice	Multlilingual dataset avilable on huggingface	here.
Arabic Speech Corpus	Arabic dataset with alignment and transcriptions	here.

Models

We currently support four models, three of them are avilable on transformers.

Language	Description	Source
Egyptian	Speech recognition	wav2vec2-large-xlsr-53-arabic-egyptian
Standard Arabic	Speech recognition	wav2vec2-large-xlsr-53-arabic
EGY, NOR, LAV, GLF, MSA	Speech classification	wav2vec2-large-xlsr-dialect-classification
Standard Arabic	Text-to-Speech	fastspeech2

Example Notebooks

Name	Description	Notebook
Demo	Classification, Recongition and Text-to-speech in a few lines of code.
Demo with mic	Audio Recongition and classification with recording.

Training

The scripts are a modification of jqueguiner/wav2vec2-sprint.

classification

This script is used for the classification task on the 5 classes.

python run_classifier.py \
   --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
   --output_dir=/path/to/output \
   --cache_dir=/path/to/cache/ \
   --freeze_feature_extractor \
   --num_train_epochs="50" \
   --per_device_train_batch_size="32" \
   --preprocessing_num_workers="1" \
   --learning_rate="3e-5" \
   --warmup_steps="20" \
   --evaluation_strategy="steps"\
   --save_steps="100" \
   --eval_steps="100" \
   --save_total_limit="1" \
   --logging_steps="100" \
   --do_eval \
   --do_train \

Recognition

This script is for training on the dataset for pretraining on the egyption dialects dataset.

python run_mgb3.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --freeze_feature_extractor \
    --num_train_epochs="50" \
    --per_device_train_batch_size="32" \
    --preprocessing_num_workers="1" \
    --learning_rate="3e-5" \
    --warmup_steps="20" \
    --evaluation_strategy="steps"\
    --save_steps="100" \
    --eval_steps="100" \
    --save_total_limit="1" \
    --logging_steps="100" \
    --do_eval \
    --do_train \

This script can be used for Arabic common voice training

python run_common_voice.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --dataset_config_name="ar" \
    --output_dir=/path/to/output/ \
    --cache_dir=/path/to/cache \
    --overwrite_output_dir \
    --num_train_epochs="1" \
    --per_device_train_batch_size="32" \
    --per_device_eval_batch_size="32" \
    --evaluation_strategy="steps" \
    --learning_rate="3e-4" \
    --warmup_steps="500" \
    --fp16 \
    --freeze_feature_extractor \
    --save_steps="10" \
    --eval_steps="10" \
    --save_total_limit="1" \
    --logging_steps="10" \
    --group_by_length \
    --feat_proj_dropout="0.0" \
    --layerdrop="0.1" \
    --gradient_checkpointing \
    --do_train --do_eval \
    --max_train_samples 100 --max_val_samples 100

Text To Speech

We use the pytorch implementation of fastspeech2 by ming024. The procedure is as follows

Download the dataset

wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip 
unzip arabic-speech-corpus.zip

Create multiple directories for data

mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

Prepare metadata

import os 
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os.listdir(f'{base_dir}/lab'):
  lines.append(lab_file[:-4]+'|'+open(f'{base_dir}/lab/{lab_file}', 'r').read())


open(f'{base_dir}/metadata.csv', 'w').write(('\n').join(lines))

Clone my fork

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

Prepare alignments and prepreocessed data

python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

Unzip vocoders

unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

Start training

python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml

Arabic speech recognition, classification and text-to-speech.

Related tags

Overview

klaam

Usage

Datasets

Models

Example Notebooks

Training

classification

Recognition

Text To Speech

Owner

ARBML

Athena is an open-source implementation of end-to-end speech processing engine.

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

Training code for Korean multi-class sentiment analysis

Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

Amazon Multilingual Counterfactual Dataset (AMCD)

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Basic yet complete Machine Learning pipeline for NLP tasks

A website which allows you to play with the GPT-2 transformer

Graph Coloring - Weighted Vertex Coloring Problem

A Practitioner's Guide to Natural Language Processing

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Knowledge Management for Humans using Machine Learning & Tags

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Multiple implementations for abstractive text summurization , using google colab