Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Last update: Jul 16, 2022

Related tags

Deep Learning xattn-transfer-for-mt

Overview

Cross-Attention Transfer for Machine Translation

This repo hosts the code to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021.

Setup

We provide our scripts and modifications to Fairseq. In this section, we describe how to go about running the code and, for instance, reproduce Table 2 in the paper.

Data

To view the data as we prepared and used it, switch to the main branch. But we recommend cloning code from this branch to avoid downloading a large amount of data at once. You can always obtain any data as necessary from the main branch.

Installations

We worked in a conda environment with Python 3.8.

First install the requirements.
```
  pip install requirements.txt
```
Then install Fairseq. To have the option to modify the package, install it in editable mode.
```
  cd fairseq-modified
  pip install -e .
```
Finally, set the following environment variable.
```
  export FAIRSEQ=$PWD
  cd ..
```

Experiments

For the purpose of this walk-through, we assume we want to train a De–En model, using the following data:

De-En
├── iwslt13.test.de
├── iwslt13.test.en
├── iwslt13.test.tok.de
├── iwslt13.test.tok.en
├── iwslt15.tune.de
├── iwslt15.tune.en
├── iwslt15.tune.tok.de
├── iwslt15.tune.tok.en
├── iwslt16.train.de
├── iwslt16.train.en
├── iwslt16.train.tok.de
└── iwslt16.train.tok.en

by transferring from a Fr–En parent model, the experiment files of which is stored under FrEn/checkpoints.

Start by making an experiment folder and preprocessing the data.
```
  mkdir test_exp
  ./xattn-transfer-for-mt/scripts/data_preprocessing/prepare_bi.sh \
      de en test_exp/ \
      De-En/iwslt16.train.tok De-En/iwslt15.tune.tok De-En/iwslt13.test.tok \
      8000
```
Please note that prepare_bi.sh is written for the most general case, where you are learning vocabulary for both the source and target sides. When necessary modify it, and reuse whatever vocabulary you want. In this case, e.g., since we are transferring from Fr–En to De–En, we will reuse the target side vocabulary from the parent. So 8000 refers to the source vocabulary size, and we need to copy parent target vocabulary instead of learning one in the script.
```
  cp ./FrEn/data/tgt.sentencepiece.bpe.model $DATA
  cp ./FrEn/data/tgt.sentencepiece.bpe.vocab $DATA
```
Now you can run an experiment. Here we want to just update the source embeddings and the cross-attention. So we run the corresponding script. Script names are self-explanatory. Set the correct path to the desired parent model checkpoint in the script, and:
```
  bash ./xattn-transfer-for-mt/scripts/training/reinit-src-embeddings-and-finetune-parent-model-on-translation_src+xattn.sh \
      test_exp/ de en
```

Finally, after training, evaluate your model. Set the correct path to the detokenizer that you use in the script, and:

  bash ./xattn-transfer-for-mt/scripts/evaluation/decode_and_score_valid_and_test.sh \
      test_exp/ de en \
      $PWD/De-En/iwslt15.tune.en $PWD/De-En/iwslt13.test.en

Issues

Please contact us and report any problems you might face through the issues tab of the repo. Thanks in advance for helping us improve the repo!

Credits

The main body of code is built upon Fairseq. We found it very easy to navigate and modify. Kudos to the developers!
The data preprocessing scripts are adopted from FLORES scripts.
To have mBART fit on the GPUs that we worked with memory-wise, we used the trimming solution provided here.

Citation

@inproceedings{gheini-cross-attention,
  title = "Cross-Attention is All You Need: {A}dapting Pretrained {T}ransformers for Machine Translation",
  author = "Gheini, Mozhdeh and Ren, Xiang and May, Jonathan",
  booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = nov,
  year = "2021"
}

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Related tags

Overview

Cross-Attention Transfer for Machine Translation

Setup

Data

Installations

Experiments

Issues

Credits

Citation

Owner

Mozhdeh Gheini

Human Pose estimation with TensorFlow framework

An index of algorithms for learning causality with data

Sleep staging from ECG, assisted with EEG

Fast RFC3339 compliant Python date-time library

LegoDNN: a block-grained scaling tool for mobile vision systems

Visual dialog agents with pre-trained vision-and-language encoders.

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

[ICCV 2021] Learning A Single Network for Scale-Arbitrary Super-Resolution

Deep GPs built on top of TensorFlow/Keras and GPflow

Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

CONditionals for Ordinal Regression and classification in tensorflow

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Real-time Neural Representation Fusion for Robust Volumetric Mapping

Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

一套完整的微博舆情分析流程代码，包括微博爬虫、LDA主题分析和情感分析。

Code for "Optimizing risk-based breast cancer screening policies with reinforcement learning"

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Related tags

Overview

Cross-Attention Transfer for Machine Translation

Setup

Data

Installations

Experiments

Issues

Credits

Citation

Owner

Mozhdeh Gheini

Human Pose estimation with TensorFlow framework

An index of algorithms for learning causality with data

Sleep staging from ECG, assisted with EEG

Fast RFC3339 compliant Python date-time library

LegoDNN: a block-grained scaling tool for mobile vision systems

Visual dialog agents with pre-trained vision-and-language encoders.

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

[ICCV 2021] Learning A Single Network for Scale-Arbitrary Super-Resolution

Deep GPs built on top of TensorFlow/Keras and GPflow

Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

CONditionals for Ordinal Regression and classification in tensorflow

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Real-time Neural Representation Fusion for Robust Volumetric Mapping

Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

一套完整的微博舆情分析流程代码，包括微博爬虫、LDA主题分析和情感分析。

Code for "Optimizing risk-based breast cancer screening policies with reinforcement learning"

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,