Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Related tags

Deep LearningxTune
Overview

xTune

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Environment

DockerFile: dancingsoul/pytorch:xTune

Install the fine-tuning code: pip install --user .

Data & Model Preparation

XTREME Datasets

  1. Create a download folder with mkdir -p download in the root of this project.
  2. manually download panx_dataset (for NER) [here][2], (note that it will download as AmazonPhotos.zip) to the download directory.
  3. run the following command to download the remaining datasets: bash scripts/download_data.sh The code of downloading dataset from XTREME is from [xtreme offical repo][1].

Note that we keep the labels in test set for easier evaluation. To prevent accidental evaluation on the test sets while running experiments, the code of [xtreme offical repo][1] removes labels of the test data during pre-processing and changes the order of the test sentences for cross-lingual sentence retrieval. Replace csv.writer(fout, delimiter='\t') with csv.writer(fout, delimiter='\t', quoting=csv.QUOTE_NONE, quotechar='') in utils_process.py if using XTREME official repo.

Translations

XTREME provides translations for SQuAD v1.1 (only train and dev), MLQA, PAWS-X, TyDiQA-GoldP, XNLI, and XQuAD, which can be downloaded from [here][3]. The xtreme_translations folder should be moved to the download directory.

The target language translations for panx and udpos are obtained with Google Translate, since they are not provided. Our processed version can be downloaded from [here][4]. It should be merged with the above xtreme_translations folder.

Bi-lingual dictionaries

We obtain the bi-lingual dictionaries from the [MUSE][6] repo. For convenience, you can download them from [here][7] and move it to the download directory, i.e., ./download/dicts.

Models

XLM-Roberta is supported. We utilize the [huggingface][5] format, which can be downloaded with bash scripts/download_model.sh.

Fine-tuning Usage

Our default settings were using Nvidia V100-32GB GPU cards. If there were out-of-memory errors, you can reduce per_gpu_train_batch_size while increasing gradient_accumulation_steps, or use multi-GPU training.

xTune consists of a two-stage training process.

  • Stage 1: fine-tuning with example consistency on the English training set.
  • Stage 2: fine-tuning with example consistency on the augmented training set and regularize model consistency with the model from Stage 1.

It's recommended to use both Stage 1 and Stage 2 for token-level tasks, such as sequential labeling, and question answering. For text classification, you can only use Stage 1 if the computation budget was limited.

bash ./scripts/train.sh [setting] [dataset] [model] [stage] [gpu] [data_dir] [output_dir]

where the options are described as follows:

  • [setting]: translate-train-all (using input translation for the languages other than English) or cross-lingual-transfer (only using English for zero-shot cross-lingual transfer)
  • [dataset]: dataset names in XTREME, i.e., xnli, panx, pawsx, udpos, mlqa, tydiqa, xquad
  • [model]: xlm-roberta-base, xlm-roberta-large
  • [stage]: 1 (first stage), 2 (second stage)
  • [gpu]: used to set environment variable CUDA_VISIBLE_DEVICES
  • [data_dir]: folder of training data
  • [output_dir]: folder of fine-tuning output

Examples: XTREME Tasks

XNLI fine-tuning on English training set and translated training sets (translate-train-all)

# run stage 1 of xTune
bash ./scripts/train.sh translate-train-all xnli xlm-roberta-base 1
# run stage 2 of xTune (optional)
bash ./scripts/train.sh translate-train-all xnli xlm-roberta-base 2

XNLI fine-tuning on English training set (cross-lingual-transfer)

# run stage 1 of xTune
bash ./scripts/train.sh cross-lingual-transfer xnli xlm-roberta-base 1
# run stage 2 of xTune (optional)
bash ./scripts/train.sh cross-lingual-transfer xnli xlm-roberta-base 2

Paper

Please cite our paper \cite{bo2021xtune} if you found the resources in the repository useful.

@inproceedings{bo2021xtune,
author = {Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei},
booktitle = {Proceedings of ACL 2021},
title = {{Consistency Regularization for Cross-Lingual Fine-Tuning}},
year = {2021}
}

Reference

  1. https://github.com/google-research/xtreme
  2. https://www.amazon.com/clouddrive/share/d3KGCRCIYwhKJF0H3eWA26hjg2ZCRhjpEQtDL70FSBN?_encoding=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&mgh=1
  3. https://console.cloud.google.com/storage/browser/xtreme_translations
  4. https://drive.google.com/drive/folders/1Rdbc0Us_4I5MpRCwLASxBwqSW8_dlF87?usp=sharing
  5. https://github.com/huggingface/transformers/
  6. https://github.com/facebookresearch/MUSE
  7. https://drive.google.com/drive/folders/1k9rQinwUXicglA5oyzo9xtgqiuUVDkjT?usp=sharing
Owner
Bo Zheng
Bo Zheng
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Dec 29, 2022
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling This repository contains the implementation for the paper Diffusion

James Thornton 50 Jan 03, 2023
Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN - hifi_gan/checkpoint/ : pretrain 2.5M ste

Phạm Quốc Huy 1 Dec 18, 2021
High frequency AI based algorithmic trading module.

Flow Flow is a high frequency algorithmic trading module that uses machine learning to self regulate and self optimize for maximum return. The current

59 Dec 14, 2022
Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

deep_autoviml Build keras pipelines and models in a single line of code! Table of Contents Motivation How it works Technology Install Usage API Image

AutoViz and Auto_ViML 102 Dec 17, 2022
Meta Language-Specific Layers in Multilingual Language Models

Meta Language-Specific Layers in Multilingual Language Models This repo contains the source codes for our paper On Negative Interference in Multilingu

Zirui Wang 20 Feb 13, 2022
Differentiable simulation for system identification and visuomotor control

gradsim gradSim: Differentiable simulation for system identification and visuomotor control gradSim is a unified differentiable rendering and multiphy

105 Dec 18, 2022
Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Rbc-traps-generative-design - The generative design for single red clood cell hydrodynamic traps using GEFEST framework

Natural Systems Simulation Lab 4 Jun 16, 2022
[ICCV2021] Learning to Track Objects from Unlabeled Videos

Unsupervised Single Object Tracking (USOT) 🌿 Learning to Track Objects from Unlabeled Videos Jilai Zheng, Chao Ma, Houwen Peng and Xiaokang Yang 2021

53 Dec 28, 2022
Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

BERT-ATTACK Code for our EMNLP2020 long paper: BERT-ATTACK: Adversarial Attack Against BERT Using BERT Dependencies Python 3.7 PyTorch 1.4.0 transform

Linyang Li 142 Jan 04, 2023
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022
2D Human Pose estimation using transformers. Implementation in Pytorch

PE-former: Pose Estimation Transformer Vision transformer architectures perform very well for image classification tasks. Efforts to solve more challe

Panteleris Paschalis 23 Oct 17, 2022
Very deep VAEs in JAX/Flax

Very Deep VAEs in JAX/Flax Implementation of the experiments in the paper Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on I

Jamie Townsend 42 Dec 12, 2022
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 02, 2023
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance.

Qualcomm Innovation Center 137 Jan 03, 2023
Explainable Medical ImageSegmentation via GenerativeAdversarial Networks andLayer-wise Relevance Propagation

MedAI: Transparency in Medical Image Segmentation What is this repo This repo contains the code and experiments that are implemented to contribute in

Awadelrahman M. A. Ahmed 1 Nov 22, 2021
An ML & Correlation platform for transforming disparate data points of interest into usable intelligence.

SSIDprobeCollector An ML & Correlation platform for transforming disparate data points of interest into usable intelligence. At a High level the platf

Bill Reyor 1 Jan 30, 2022
Mmrotate - OpenMMLab Rotated Object Detection Benchmark

OpenMMLab website HOT OpenMMLab platform TRY IT OUT 📘 Documentation | 🛠️ Insta

OpenMMLab 1.2k Jan 04, 2023
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022