Arabic speech recognition, classification and text-to-speech.

Overview

klaam

Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows training and prediction using pretrained models.

Usage

from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)

from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)

from klaam import TextToSpeech
model = TextToSpeech()
model.synthesize(sample_text)

There are two avilable models for recognition trageting MSA and egyptian dialect . You can set any of them using the lang attribute

 from klaam import SpeechRecognition
 model = SpeechRecognition(lang = 'msa')
 model.transcribe('file.wav')

Datasets

Dataset Description link
MGB-3 Egyptian Arabic Speech recognition in the wild. Every sentence was annotated by four annotators. More than 15 hours have been collected from YouTube. requires registeration here
ADI-5 More than 50 hours collected from Aljazeera TV. 4 regional dialectal: Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). This dataset is a part of the MGB-3 challenge. requires registeration here
Common voice Multlilingual dataset avilable on huggingface here.
Arabic Speech Corpus Arabic dataset with alignment and transcriptions here.

Models

We currently support four models, three of them are avilable on transformers.

Language Description Source
Egyptian Speech recognition wav2vec2-large-xlsr-53-arabic-egyptian
Standard Arabic Speech recognition wav2vec2-large-xlsr-53-arabic
EGY, NOR, LAV, GLF, MSA Speech classification wav2vec2-large-xlsr-dialect-classification
Standard Arabic Text-to-Speech fastspeech2

Example Notebooks

Name Description Notebook
Demo Classification, Recongition and Text-to-speech in a few lines of code.
Demo with mic Audio Recongition and classification with recording.

Training

The scripts are a modification of jqueguiner/wav2vec2-sprint.

classification

This script is used for the classification task on the 5 classes.

python run_classifier.py \
   --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
   --output_dir=/path/to/output \
   --cache_dir=/path/to/cache/ \
   --freeze_feature_extractor \
   --num_train_epochs="50" \
   --per_device_train_batch_size="32" \
   --preprocessing_num_workers="1" \
   --learning_rate="3e-5" \
   --warmup_steps="20" \
   --evaluation_strategy="steps"\
   --save_steps="100" \
   --eval_steps="100" \
   --save_total_limit="1" \
   --logging_steps="100" \
   --do_eval \
   --do_train \

Recognition

This script is for training on the dataset for pretraining on the egyption dialects dataset.

python run_mgb3.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --freeze_feature_extractor \
    --num_train_epochs="50" \
    --per_device_train_batch_size="32" \
    --preprocessing_num_workers="1" \
    --learning_rate="3e-5" \
    --warmup_steps="20" \
    --evaluation_strategy="steps"\
    --save_steps="100" \
    --eval_steps="100" \
    --save_total_limit="1" \
    --logging_steps="100" \
    --do_eval \
    --do_train \

This script can be used for Arabic common voice training

python run_common_voice.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --dataset_config_name="ar" \
    --output_dir=/path/to/output/ \
    --cache_dir=/path/to/cache \
    --overwrite_output_dir \
    --num_train_epochs="1" \
    --per_device_train_batch_size="32" \
    --per_device_eval_batch_size="32" \
    --evaluation_strategy="steps" \
    --learning_rate="3e-4" \
    --warmup_steps="500" \
    --fp16 \
    --freeze_feature_extractor \
    --save_steps="10" \
    --eval_steps="10" \
    --save_total_limit="1" \
    --logging_steps="10" \
    --group_by_length \
    --feat_proj_dropout="0.0" \
    --layerdrop="0.1" \
    --gradient_checkpointing \
    --do_train --do_eval \
    --max_train_samples 100 --max_val_samples 100

Text To Speech

We use the pytorch implementation of fastspeech2 by ming024. The procedure is as follows

Download the dataset

wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip 
unzip arabic-speech-corpus.zip 

Create multiple directories for data

mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

Prepare metadata

import os 
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os.listdir(f'{base_dir}/lab'):
  lines.append(lab_file[:-4]+'|'+open(f'{base_dir}/lab/{lab_file}', 'r').read())


open(f'{base_dir}/metadata.csv', 'w').write(('\n').join(lines))

Clone my fork

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

Prepare alignments and prepreocessed data

python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

Unzip vocoders

unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

Start training

python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml
Owner
ARBML
Arabic NLP
ARBML
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。

logCong 785 Dec 29, 2022
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.2k Jan 08, 2023
Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Ankur Dhuriya 10 Oct 13, 2022
Azure Text-to-speech service for Home Assistant

Azure Text-to-speech service for Home Assistant The Azure text-to-speech platform uses online Azure Text-to-Speech cognitive service to read a text wi

Yassine Selmi 2 Aug 06, 2022
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022
C.J. Hutto 3.8k Dec 30, 2022
[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Learning Signal-Agnostic Manifolds of Neural Fields This is the uncleaned code for the paper Learning Signal-Agnostic Manifolds of Neural Fields. The

60 Dec 12, 2022
jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

jel: Japanese Entity Linker jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese. Usage Currently, link and question methods

izuna385 10 Jan 06, 2023
This program do translate english words to portuguese

Python-Dictionary This program is used to translate english words to portuguese. Web-Scraping This program use BeautifulSoap to make web scraping, so

João Assalim 1 Oct 10, 2022
State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

Open Business Software Solutions 42 Sep 21, 2022
Saptak Bhoumik 14 May 24, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

PlanTL-SANIDAD 203 Dec 20, 2022
ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体

ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体,包括上市公司所属行业关系、行业上级关系、产品上游原材料关系、产品下游产品关系、公司主营产品、产品小类共6大类。 上市公司4,654家,行业511个,产品95,559条、上游材料56,824条,上级行业480条,下游产品390条,产品小类52,937条,所属行业3,946条。

liuhuanyong 415 Jan 06, 2023
Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

NTT Communication Science Laboratories 216 Dec 22, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 03, 2023
Tensorflow implementation of paper: Learning to Diagnose with LSTM Recurrent Neural Networks.

Multilabel time series classification with LSTM Tensorflow implementation of model discussed in the following paper: Learning to Diagnose with LSTM Re

Aaqib 552 Nov 28, 2022
It analyze the sentiment of the user, whether it is postive or negative.

Sentiment-Analyzer-Tool It analyze the sentiment of the user, whether it is postive or negative. It uses streamlit library for creating this sentiment

Paras Patidar 18 Dec 17, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
تولید اسم های رندوم فینگیلیش

karafs کرفس تولید اسم های رندوم فینگیلیش installation ➜ pip install karafs usage دو زبانه ➜ karafs -n 10 توت فرنگی بی ناموس toot farangi-ye bi_namoos

Vaheed NÆINI (9E) 36 Nov 24, 2022