Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Related tags

Deep LearningmRASP2
Overview

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

The code for training mCOLT/mRASP2, a multilingual NMT training framework, implemented based on fairseq.

mRASP2: paper

mRASP: paper, code


News

We have released two versions, this version is the original one. In this implementation:

  • You should first merge all data, by pre-pending language token before each sentence to indicate the language.
  • AA/RAS muse be done off-line (before binarize), check this toolkit.

New implementation: https://github.com/PANXiao1994/mRASP2/tree/new_impl

  • Acknowledgement: This work is supported by Bytedance. We thank Chengqi for uploading all files and checkpoints.

Introduction

mRASP2/mCOLT, representing multilingual Contrastive Learning for Transformer, is a multilingual neural machine translation model that supports complete many-to-many multilingual machine translation. It employs both parallel corpora and multilingual corpora in a unified training framework. For detailed information please refer to the paper.

img.png

Pre-requisite

pip install -r requirements.txt

Training Data and Checkpoints

We release our preprocessed training data and checkpoints in the following.

Dataset

We merge 32 English-centric language pairs, resulting in 64 directed translation pairs in total. The original 32 language pairs corpus contains about 197M pairs of sentences. We get about 262M pairs of sentences after applying RAS, since we keep both the original sentences and the substituted sentences. We release both the original dataset and dataset after applying RAS.

Dataset #Pair
32-lang-pairs-TRAIN 197603294
32-lang-pairs-RAS-TRAIN 262662792
mono-split-a -
mono-split-b -
mono-split-c -
mono-split-d -
mono-split-e -
mono-split-de-fr-en -
mono-split-nl-pl-pt -
32-lang-pairs-DEV-en-centric -
32-lang-pairs-DEV-many-to-many -
Vocab -
BPE Code -

Checkpoints & Results

  • Please note that the provided checkpoint is sightly different from that in the paper. In the following sections, we report the results of the provided checkpoints.

English-centric Directions

We report tokenized BLEU in the following table. (check eval.sh for details)

6e6d-no-mono 12e12d-no-mono 12e12d
en2cs/wmt16 21.0 22.3 23.8
cs2en/wmt16 29.6 32.4 33.2
en2fr/wmt14 42.0 43.3 43.4
fr2en/wmt14 37.8 39.3 39.5
en2de/wmt14 27.4 29.2 29.5
de2en/wmt14 32.2 34.9 35.2
en2zh/wmt17 33.0 34.9 34.1
zh2en/wmt17 22.4 24.0 24.4
en2ro/wmt16 26.6 28.1 28.7
ro2en/wmt16 36.8 39.0 39.1
en2tr/wmt16 18.6 20.3 21.2
tr2en/wmt16 22.2 25.5 26.1
en2ru/wmt19 17.4 18.5 19.2
ru2en/wmt19 22.0 23.2 23.6
en2fi/wmt17 20.2 22.1 22.9
fi2en/wmt17 26.1 29.5 29.7
en2es/wmt13 32.8 34.1 34.6
es2en/wmt13 32.8 34.6 34.7
en2it/wmt09 28.9 30.0 30.8
it2en/wmt09 31.4 32.7 32.8

Unsupervised Directions

We report tokenized BLEU in the following table. (check eval.sh for details)

12e12d
en2pl/wmt20 6.2
pl2en/wmt20 13.5
en2nl/iwslt14 8.8
nl2en/iwslt14 27.1
en2pt/opus100 18.9
pt2en/opus100 29.2

Zero-shot Directions

  • row: source language
  • column: target language We report sacreBLEU in the following table.
12e12d ar zh nl fr de ru
ar - 32.5 3.2 22.8 11.2 16.7
zh 6.5 - 1.9 32.9 7.6 23.7
nl 1.7 8.2 - 7.5 10.2 2.9
fr 6.2 42.3 7.5 - 18.9 24.4
de 4.9 21.6 9.2 24.7 - 14.4
ru 7.1 40.6 4.5 29.9 13.5 -

Training

export NUM_GPU=4 && bash train_w_mono.sh ${model_config}
  • We give example of ${model_config} in ${PROJECT_REPO}/examples/configs/parallel_mono_12e12d_contrastive.yml

Inference

  • You must pre-pend the corresponding language token to the source side before binarize the test data.
${final_res_file} python3 ${repo_dir}/scripts/utils.py ${res_file} ${ref_file} || exit 1; ">
fairseq-generate ${test_path} \
    --user-dir ${repo_dir}/mcolt \
    -s ${src} \
    -t ${tgt} \
    --skip-invalid-size-inputs-valid-test \
    --path ${ckpts} \
    --max-tokens ${batch_size} \
    --task translation_w_langtok \
    ${options} \
    --lang-prefix-tok "LANG_TOK_"`echo "${tgt} " | tr '[a-z]' '[A-Z]'` \
    --max-source-positions ${max_source_positions} \
    --max-target-positions ${max_target_positions} \
    --nbest 1 | grep -E '[S|H|P|T]-[0-9]+' > ${final_res_file}
python3 ${repo_dir}/scripts/utils.py ${res_file} ${ref_file} || exit 1;

Synonym dictionaries

We use the bilingual synonym dictionaries provised by MUSE.

We generate multilingual synonym dictionaries using this script, and apply RAS using this script.

Description File Size
dep=1 synonym_dict_raw_dep1 138.0 M
dep=2 synonym_dict_raw_dep2 1.6 G
dep=3 synonym_dict_raw_dep3 2.2 G

Contact

Please contact me via e-mail [email protected] or via wechat/zhihu

Citation

Please cite as:

@inproceedings{mrasp2,
  title = {Contrastive Learning for Many-to-many Multilingual Neural Machine Translation},
  author= {Xiao Pan and
           Mingxuan Wang and
           Liwei Wu and
           Lei Li},
  booktitle = {Proceedings of ACL 2021},
  year = {2021},
}
This repository contains a CBIR system that uses swin transformer to extract image's feature.

Swin-transformer based CBIR This repository contains a CBIR(content-based image retrieval) system. Here we use Swin-transformer to extract query image

JsHou 12 Nov 17, 2022
Code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms.

RDC-SLAM This repository contains code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms. The system takes in

40 Nov 19, 2022
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 16 Oct 05, 2022
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger 🍔 Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Value Retrieval with Arbitrary Queries for Form-like Documents Introduction Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-

Salesforce 13 Sep 15, 2022
Fine-tune pretrained Convolutional Neural Networks with PyTorch

Fine-tune pretrained Convolutional Neural Networks with PyTorch. Features Gives access to the most popular CNN architectures pretrained on ImageNet. A

Alex Parinov 694 Nov 23, 2022
Acoustic mosquito detection code with Bayesian Neural Networks

HumBugDB Acoustic mosquito detection with Bayesian Neural Networks. Extract audio or features from our large-scale dataset on Zenodo. This repository

31 Nov 28, 2022
Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Data

2 Oct 06, 2022
A object detecting neural network powered by the yolo architecture and leveraging the PyTorch framework and associated libraries.

Yolo-Powered-Detector A object detecting neural network powered by the yolo architecture and leveraging the PyTorch framework and associated libraries

Luke Wilson 1 Dec 03, 2021
Implementation of OpenAI paper with Simple Noise Scale on Fastai V2

README Implementation of OpenAI paper "An Empirical Model of Large-Batch Training" for Fastai V2. The code is based on the batch size finder implement

13 Dec 10, 2021
disentanglement_lib is an open-source library for research on learning disentangled representations.

disentanglement_lib disentanglement_lib is an open-source library for research on learning disentangled representation. It supports a variety of diffe

Google Research 1.3k Dec 28, 2022
Implementation of Uformer, Attention-based Unet, in Pytorch

Uformer - Pytorch Implementation of Uformer, Attention-based Unet, in Pytorch. It will only offer the concat-cross-skip connection. This repository wi

Phil Wang 72 Dec 19, 2022
A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE-Pytorch This repository is a pytorch version that implements Ali's ACL 2021 research paper Learning Span-Level Interactions for Aspect Senti

来自丹麦的天籁 10 Dec 06, 2022
This repository contains the scripts for downloading and validating scripts for the documents

HC4: HLTCOE CLIR Common-Crawl Collection This repository contains the scripts for downloading and validating scripts for the documents. Document ids,

JHU Human Language Technology Center of Excellence 6 Jun 07, 2022
On-device speech-to-intent engine powered by deep learning

Rhino Made in Vancouver, Canada by Picovoice Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a giv

Picovoice 510 Dec 30, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Bin Xiao 175 Jan 08, 2023
The official implementation of CircleNet: Anchor-free Detection with Circle Representation, MICCAI 2030

CircleNet: Anchor-free Detection with Circle Representation The official implementation of CircleNet, MICCAI 2020 [PyTorch] [project page] [MICCAI pap

The Biomedical Data Representation and Learning Lab 45 Nov 18, 2022
Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory = 8G Numpy 1.

46 Dec 14, 2022
Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.

Attack-Probabilistic-Models This is the source code for Adversarial Attacks on Probabilistic Autoregressive Forecasting Models. This repository contai

SRI Lab, ETH Zurich 25 Sep 14, 2022