Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Last update: Jun 22, 2022

Overview

Low-resource-Machine-Translation

This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal of the project is to replicate the experiments performed by Dabre et al. on low-resource machine translation. In particular, starting from a machine translation model pretrained on a large dataset, we finetune it on a low-resource language.

Implementation details

The initial model chosen for the task is MarianMT, a transformer-based model pretrained on a large English-Chinese corpus. The model is finetuned on three low-resource languages from the ALT dataset (Vietnamese, Indonesian and Filipino). The finetuning is performed using the Huggingface 🤗 Transformers library.

Owner

Andrea Cavallo

MSc in Computer Engineering and Artificial Intelligence

GitHub Repository

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Making Emojis More Predictable by Karan Abrol, Karanjot Singh and Pritish Wadhwa, Natural Language Processing (CSE546) under the guidance of Dr. Shad

2 Jan 17, 2022

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Related tags

Overview

Low-resource-Machine-Translation

Implementation details

Owner

Andrea Cavallo

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

A python package to fine-tune transformer-based models for named entity recognition (NER).

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Legal text retrieval for python

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

Fixes mojibake and other glitches in Unicode text, after the fact.

Conversational text Analysis using various NLP techniques

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Generating Korean Slogans with phonetic and structural repetition

CMeEE 数据集医学实体抽取

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Edge-Augmented Graph Transformer

A programming language with logic of Python, and syntax of all languages.

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.