⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Last update: Nov 25, 2022

Overview

BERT-of-Theseus

Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing".

BERT-of-Theseus is a new compressed BERT by progressively replacing the components of the original BERT.

Citation

If you use this code in your research, please cite our paper:

@inproceedings{xu-etal-2020-bert,
    title = "{BERT}-of-Theseus: Compressing {BERT} by Progressive Module Replacing",
    author = "Xu, Canwen  and
      Zhou, Wangchunshu  and
      Ge, Tao  and
      Wei, Furu  and
      Zhou, Ming",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.633",
    pages = "7859--7869"
}

NEW: We have uploaded a script for making predictions on GLUE tasks and preparing for leaderboard submission. Check out here!

How to run BERT-of-Theseus

Requirement

Our code is built on huggingface/transformers. To use our code, you must clone and install huggingface/transformers.

Compress a BERT

You should fine-tune a predecessor model following the instruction from huggingface and then save it to a directory if you haven't done so.
Run compression following the examples below:

# For compression with a replacement scheduler
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py \
  --model_name_or_path /path/to/saved_predecessor \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir "$GLUE_DIR/$TASK_NAME" \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 50 \
  --num_train_epochs 15 \
  --output_dir /path/to/save_successor/ \
  --evaluate_during_training \
  --replacing_rate 0.3 \
  --scheduler_type linear \
  --scheduler_linear_k 0.0006

# For compression with a constant replacing rate
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py \
  --model_name_or_path /path/to/saved_predecessor \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir "$GLUE_DIR/$TASK_NAME" \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 50 \
  --num_train_epochs 15 \
  --output_dir /path/to/save_successor/ \
  --evaluate_during_training \
  --replacing_rate 0.5 \
  --steps_for_replacing 2500

For the detailed description of arguments, please refer to the source code.

Load Pretrained Model on MNLI

We provide a 6-layer pretrained model on MNLI as a general-purpose model, which can transfer to other sentence classification tasks, outperforming DistillBERT (with the same 6-layer structure) on six tasks of GLUE (dev set).

Method	MNLI	MRPC	QNLI	QQP	RTE	SST-2	STS-B
BERT-base	83.5	89.5	91.2	89.8	71.1	91.5	88.9
DistillBERT	79.0	87.5	85.3	84.9	59.9	90.7	81.2
BERT-of-Theseus	82.1	87.5	88.8	88.8	70.1	91.8	87.8

You can easily load our general-purpose model using huggingface/transformers.

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("canwenxu/BERT-of-Theseus-MNLI")

model = AutoModel.from_pretrained("canwenxu/BERT-of-Theseus-MNLI")

Bug Report and Contribution

If you'd like to contribute and add more tasks (only GLUE is available at this moment), please submit a pull request and contact me. Also, if you find any problem or bug, please report with an issue. Thanks!

Third-Party Implementations

We list some third-party implementations from the community here. Please kindly add your implementation to this list:

Tensorflow Implementation (tested on NER): https://github.com/qiufengyuyi/bert-of-theseus-tf
Keras Implementation (tested on text classification): https://github.com/bojone/bert-of-theseus

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Related tags

Overview

BERT-of-Theseus

Citation

How to run BERT-of-Theseus

Requirement

Compress a BERT

Load Pretrained Model on MNLI

Bug Report and Contribution

Third-Party Implementations

Owner

Kevin Canwen Xu

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Beyond Paragraphs: NLP for Long Sequences

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

LCG T-TEST USING EUCLIDEAN METHOD

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

Easy, fast, effective, and automatic g-code compression!

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"