CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Last update: Feb 28, 2022

Related tags

Overview

M-BERT-Study

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Motivation

Multilingual BERT (M-BERT) has shown surprising cross lingual abilities --- even when it is trained without cross lingual objectives. In this work, we analyze what causes this multilinguality from three factors: linguistic properties of the languages, the architecture of the model, and the learning objectives.

Results

Linguistic properties:

Code switching text (word-piece overlap) is not the main cause of multilinguality.
Word ordering is crucial, when words in sentences are randomly permuted, multilinguality is low, however, still significantly better than random.
(Unigram) word frequency is not enough, as we resampled all words with the same frequency, and found almost random performance. Combining the second and the third property infers that there is language similarity other than ordering of words between two languages, and which unigram frequency does not capture. We hypothesize that it may be similarity of n-gram occurrences.

Architecture:

Depth of the transformer is the most important.
Number of attention heads effects the absolute performance on individual languages, but the gap between in-language supervision and cross-language zero-shot learning didn't change much.
Total number of parameters, like depth, effects multilinguality.

Learning Objectives:

Next Sentence Prediction objective, when removed, leads to slight increase in performance.
Even marking sentences in languages with language-ids, allowing BERT to know exactly which language its learning on, did not hurt performance
Using word-pieces leads to strong improvements on both source and target language (likely to depend on tasks) and slight improvement cross-lingually comparing to word or character based models.

Please refer to our paper for more details.

Scripts

Creating pre-training data

If you would like to pre-train a BERT with Fake language/permuted sentences, see preprocessing-scripts for how to create the tfrecords for BERT training.

Pre-training BERT

Once you have uploaded the tfrecords to google cloud, you can set up an instance and start BERT training via bert-running-scripts.

Evaluating

With models we provide or just trained, we provide code for evaluating on two tasks, NER and entailment. See evaluating-scripts.

BERT Models

We release the following bert models (in a few days):

Word-piece Experiments
Word Order Experiments
Word Frequency Experiments
Model Structure Experiments

See data for detailed paths to download (in a few days).

Requirements

allennlp: 0.9.0
ccg_nlpy

Citation

Please cite the following paper if you find our paper useful. Thanks!

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth. "Cross-Lingual Ability of Multilingual BERT: An Empirical Study" arXiv preprint arXiv:1912.07840 (2019).

@article{wang2019cross,
  title={Cross-Lingual Ability of Multilingual BERT: An Empirical Study},
  author={K, Karthikeyan and Wang, Zihan and Mayhew, Stephen and Roth, Dan},
  journal={arXiv preprint arXiv:1912.07840},
  year={2019}
}

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Related tags

Overview

M-BERT-Study

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Motivation

Results

Scripts

Creating pre-training data

Pre-training BERT

Evaluating

BERT Models

Requirements

Citation

Owner

CogComp

Code and dataset for AAAI 2021 paper FixMyPose: Pose Correctional Describing and Retrieval Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal.

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Certis - Certis, A High-Quality Backtesting Engine

Code for "CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds" @ICRA2021

The pytorch implementation of the paper "text-guided neural image inpainting" at MM'2020

Pytorch implementation of YOLOX、PPYOLO、PPYOLOv2、FCOS an so on.

CONditionals for Ordinal Regression and classification in PyTorch

This Deep Learning Model Predicts that from which disease you are suffering.

code for our BMVC 2021 paper "HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification"

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

UFPR-ADMR-v2 Dataset

The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Sentiment analysis translations of the Bhagavad Gita

Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it.

A python library for highly configurable transformers - easing model architecture search and experimentation.

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020).