DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Last update: Jan 07, 2023

Related tags

Overview

DziriBERT

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian text contents written using both Arabic and Latin characters. It sets new state of the art results on Algerian text classification datasets, even if it has been pre-trained on much less data (~1 million tweets).

The model is publicly available at: https://huggingface.co/alger-ia/dziribert.

For more information, please visit our paper: https://arxiv.org/pdf/2109.12346.pdf

Evaluation

The Twifil dataset was used to compare DziriBERT with current multilingual, standard Arabic and dialectal Arabic models:

Model	Sentiment acc.	Emotion acc.
bert-base-multilingual-cased	73.6 %	59.4 %
aubmindlab/bert-base-arabert	72.1 %	61.2 %
CAMeL-Lab/bert-base-arabic-camelbert-mix	77.1 %	65.7 %
qarib/bert-base-qarib	77.7 %	67.6 %
UBC-NLP/MARBERT	80.1 %	68.4 %
alger-ia/dziribert	80.3 %	69.3 %

In order to reproduce these results, please install the following requirements:

pip install -r requirements.txt

Then, run the following evaluation script:

python3 evaluate_model.py

These results have been obtained on a Tesla K80 GPU.

Pretrained DziriBERT

DziriBERT has been uploaded to the HuggingFace hub in order to facilitate its use: https://huggingface.co/alger-ia/dziribert.

It can be easily downloaded and loaded using the transformers library:

from transformers import BertTokenizer, BertForMaskedLM

tokenizer = BertTokenizer.from_pretrained("alger-ia/dziribert")
model = BertForMaskedLM.from_pretrained("alger-ia/dziribert")

How to cite

@article{dziribert,
  title={DziriBERT: a Pre-trained Language Model for the Algerian Dialect},
  author={Abdaoui, Amine and Berrimi, Mohamed and Oussalah, Mourad and Moussaoui, Abdelouahab},
  journal={arXiv preprint arXiv:2109.12346},
  year={2021}
}

Contact

Please contact [email protected] for any question, feedback or request.

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Related tags

Overview

DziriBERT

Evaluation

Pretrained DziriBERT

How to cite

Contact

Owner

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

Editing a classifier by rewriting its prediction rules

A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

Automatic Image Background Subtraction

Multi-Task Learning as a Bargaining Game

PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

StarGAN2 for practice

Optimus: the first large-scale pre-trained VAE language model

Meli Data Challenge 2021 - First Place Solution

Deep Latent Force Models

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Related tags

Overview

DziriBERT

Evaluation

Pretrained DziriBERT

How to cite

Contact

Owner

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

Editing a classifier by rewriting its prediction rules

A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

Automatic Image Background Subtraction

Multi-Task Learning as a Bargaining Game

PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

StarGAN2 for practice

Optimus: the first large-scale pre-trained VAE language model

Meli Data Challenge 2021 - First Place Solution

Deep Latent Force Models

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.