darija <-> english dictionary

Related tags

Deep Learningdataset
Overview

darija-dictionary

Having advanced IT solutions that are well adapted to the Moroccan context passes inevitably through understanding Moroccan dialect. Hence, darija (Moroccan dialect) should be an active player in the domain of Natural Language Processing (NLP).

However, it turns out that step 0 in any serious engagement with darija in NLP will consist of translating its vocabulary to the widely used and most documented language in this field, namely English.

This open source project aims to be a reference in addressing this issue. We hope for the contribution of the Moroccan IT community in order to build up the largest dataset of darija-english vocabulary which will serve as a pedestal for any future application of NLP to benefit Moroccan people.


DODa video


How to contribute

We've made a tutorial for you in DODa's website


Guidelines / Recommendations

  • 3ndk ح dir ح xD (shout-out to this guy 😆 ), often try to use:
darija 3 7 9 8 2 - 'a' - 'i' 5 - 'kh'
arabic ع ح ق ه همزة خ
  • Try to use capitalization to differentiate between the following letters:
t T s S d D
ت ط س ص د ض
  • Arabic characters with two-letters Latin equivalent:
Arabic alphabet ش غ خ
Latin alphabet ch gh kh
  • Double characters to refer to the emphasis or "الشدة":
darija 7mam 7mmam
english pigeons bathroom
  • We usually don't add "e" in the end of darija words : louz instead of louze

  • We usually don't use "Z" or "th" for ظ ، ذ ، ث , because we generally don't use these letters in darija (except in northern Morocco, but for the sake of simplicity, we are focusing primarily on standard darija)

  • We do NOT use apostrophes. In fact, since we are working on csv files, apostrophes will break off words

  • We use spaces as word delimiters, not _ nor - : thank you instead of thank_you

  • Respect the number of columns in every row you add, you can use empty quotation marks "" in case you don't have extra variations

  • In every row, always start with the most used form (in your opinion of course) of the word in question

  • For future use of this dataset to train deep neural networks, try to reserve each row to similar variations of the same word. For instance, "sou9" and "marchi" both translate to "market", yet it's better to separate them into two different rows:

"sou9","souk","souq","market"

"marchi","","","market"

  • verbs.csv: The darija translation is reserved to the past tense of the third pronoun "he", whereas the other pronouns and tenses are handled in separate files. The English translation present the basic form (or root) of the English verb.

"ghnna","ghenna","ghanna","","","","sing"

  • masculine_feminine_plural.csv: If it does exist, feminine-plural translation column is for nouns. Regarding adjectives feminine-plural = feminine.

Citation

@misc{outchakoucht2021moroccan,
      title={Moroccan Dialect -Darija- Open Dataset},
      author={Aissam Outchakoucht and Hamza Es-Samaali},
      year={2021},
      eprint={2103.09687},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Owner
DODa
Darija Open Dataset
DODa
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Semantic Image Synthesis via Adversarial Learning This is a PyTorch implementation of the paper Semantic Image Synthesis via Adversarial Learning. Req

Seonghyeon Nam 146 Nov 25, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 542 Dec 03, 2022
PCGNN - Procedural Content Generation with NEAT and Novelty

PCGNN - Procedural Content Generation with NEAT and Novelty Generation Approach — Metrics — Paper — Poster — Examples PCGNN - Procedural Content Gener

Michael Beukman 8 Dec 10, 2022
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 08, 2023
OpenVisionAPI server

🚀 Quick start An instance of ova-server is free and publicly available here: https://api.openvisionapi.com Checkout ova-client for a quick demo. Inst

Open Vision API 93 Nov 24, 2022
Cobalt Strike teamserver detection.

Cobalt-Strike-det Cobalt Strike teamserver detection. usage: cobaltstrike_verify.py [-l TARGETS] [-t THREADS] optional arguments: -h, --help show this

TimWhite 17 Sep 27, 2022
Semantic Edge Detection with Diverse Deep Supervision

Semantic Edge Detection with Diverse Deep Supervision This repository contains the code for our IJCV paper: "Semantic Edge Detection with Diverse Deep

Yun Liu 12 Dec 31, 2022
Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

Complex-Valued Neural Networks (CVNN) Done by @NEGU93 - J. Agustin Barrachina Using this library, the only difference with a Tensorflow code is that y

youceF 1 Nov 12, 2021
Turn based roguelike in python

pyTB Turn based roguelike in python Documentation can be found here: http://mcgillij.github.io/pyTB/index.html Screenshot Dependencies Written in Pyth

Jason McGillivray 4 Sep 29, 2022
Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Topology-Imbalance Learning for Semi-Supervised Node Classification Introduction Code for NeurIPS 2021 paper "Topology-Imbalance Learning for Semi-Sup

Victor Chen 40 Nov 23, 2022
Minimal PyTorch implementation of YOLOv3

A minimal PyTorch implementation of YOLOv3, with support for training, inference and evaluation.

Erik Linder-Norén 6.9k Dec 29, 2022
Toward Multimodal Image-to-Image Translation

BicycleGAN Project Page | Paper | Video Pytorch implementation for multimodal image-to-image translation. For example, given the same night image, our

Jun-Yan Zhu 1.4k Dec 22, 2022
A curated (most recent) list of resources for Learning with Noisy Labels

A curated (most recent) list of resources for Learning with Noisy Labels

Jiaheng Wei 321 Jan 09, 2023
[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

hawkey 78 Dec 27, 2022
An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

Andrew Stelmach 1 Jan 05, 2022
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes, ICCV 2017

AdaptationSeg This is the Python reference implementation of AdaptionSeg proposed in "Curriculum Domain Adaptation for Semantic Segmentation of Urban

Yang Zhang 128 Oct 19, 2022
The pytorch implementation of SOKD (BMVC2021).

Semi-Online Knowledge Distillation Implementations of SOKD. Requirements This repo was tested with Python 3.8, PyTorch 1.5.1, torchvision 0.6.1, CUDA

4 Dec 19, 2021
SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

Yad Konrad 81 Jun 30, 2022
PyTorch implementation of SQN based on CloserLook3D's encoder

SQN_pytorch This repo is an implementation of Semantic Query Network (SQN) using CloserLook3D's encoder in Pytorch. For TensorFlow implementation, che

PointCloudYC 1 Oct 21, 2021