Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Last update: Dec 28, 2022

Related tags

Overview

Smaller Multilingual Transformers

This repository shares smaller versions of multilingual transformers that keep the same representations offered by the original ones. The idea came from a simple observation: after massively multilingual pretraining, not all embeddings are needed to perform finetuning and inference. In practice one would rarely require a model that supports more than 100 languages as the original mBERT. Therefore, we extracted several smaller versions that handle fewer languages. Since most of the parameters of multilingual transformers are located in the embeddings layer, our models are between 21% and 45% smaller in size.

The table bellow compares two of our exracted versions with the original mBERT. It shows the models size, memory footprint and the obtained accuracy on the XNLI dataset (Cross-lingual Transfer from english for french). These measurements have been computed on a Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB).

Model	Num parameters	Size	Memory	Accuracy
bert-base-multilingual-cased	178 million	714 MB	1400 MB	73.8
Geotrend/bert-base-15lang-cased	141 million	564 MB	1098 MB	74.1
Geotrend/bert-base-en-fr-cased	112 million	447 MB	878 MB	73.8

Reducing the size of multilingual transformers facilitates their deployment on public cloud platforms. For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML).

For more information, please refer to our paper: Load What You Need.

Available Models

Until now, we generated 70 smaller models from the original mBERT cased version. These models have been uploaded to the Hugging Face Model Hub in order to facilitate their use: https://huggingface.co/Geotrend.

They can be downloaded easily using the transformers library:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-fr-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-en-fr-cased")

More models will be released soon.

Generating new Models

We also share a python script that allows users to generate smaller transformers by their own based on a subset of the original vocabulary (the method does not only concern multilingual transformers):

pip install -r requirements.txt

python3 reduce_model.py \
	--source_model bert-base-multilingual-cased \
	--vocab_file vocab_5langs.txt \
	--output_model bert-base-5lang-cased \
	--convert_to_tf False

Where:

--source_model is the multilingual transformer to reduce
--vocab_file is the intended vocabulary file path
--output_model is the name of the final reduced model
--convert_to_tf tells the scipt whether to generate a tenserflow version or not

How to Cite

@inproceedings{smallermbert,
  title={Load What You Need: Smaller Versions of Multilingual BERT},
  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
  booktitle={SustaiNLP / EMNLP},
  year={2020}
}

Contact

Please contact [email protected] for any question, feedback or request.

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Related tags

Overview

Smaller Multilingual Transformers

Available Models

Generating new Models

How to Cite

Contact

Owner

Geotrend

Use your Philips Hue lights as Racing Flags. Works with Assetto Corsa, Assetto Corsa Competizione and iRacing.

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators

NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

Pose estimation for iOS and android using TensorFlow 2.0

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

Multi-Template Mouse Brain MRI Atlas (MBMA): both in-vivo and ex-vivo

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation

Rendering Point Clouds with Compute Shaders

This is a TensorFlow implementation for C2-Rec

Model that predicts the probability of a Twitter user being anti-vaccination.

A repository for interferometer controller code.

Improving Compound Activity Classification via Deep Transfer and Representation Learning

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

An implementation of quantum convolutional neural network with MindQuantum. Huawei, classifying MNIST dataset