Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Last update: Nov 10, 2022

Overview

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

pip install -r requirements.txt

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the latest dump of Wikipedia from HERE, and extract it into ./dataset/pretrain_data/download_wikipedia/.
Download a mirror of BooksCorpus from HERE, and extract it into ./dataset/pretrain_data/download_bookcorpus/.

- Pre-training data

bash create_pretrain_data.sh
bash create_pretrain_feature.sh

The features of Wikipedia, BooksCorpus, and their concatenation will be saved into ./dataset/pretrain_data/wikipedia_nomask/, ./dataset/pretrain_data/bookcorpus_nomask/, and ./dataset/pretrain_data/wiki_book_nomask/, respectively.

- Fine-tuning data

Download the GLUE dataset using the script in HERE, and put the files into ./dataset/glue/.
Download the SQuAD v1.1 and v2.0 datasets from the following links:

and put them into ./dataset/squad/.

Pre-train the supernet

bash pretrain_supernet.sh

The checkpoints will be saved into ./exp/pretrain/supernet/, and the names of the sub-directories should be modified into stage1_2 and stage3 correspondingly.

We also provide the checkpoint of the supernet in stage 3 (pre-trained with both Wikipedia and BooksCorpus) at HERE.

Train the teacher model (BERT$_{\rm BASE}$)

bash train.sh

The checkpoints will be saved into ./exp/train/bert_base/, and the names of the sub-directories should be modified into the corresponding task name (i.e., mnli, qqp, qnli, sst-2, cola, sts-b, mrpc, rte, wnli, squad1.1, and squad2.0). Each sub-directory contains a checkpoint named best_model.bin.

Conduct NAS (including search stage 1, 2, and 3)

bash ffn_search.sh

The checkpoints will be saved into ./exp/ffn_search/.

Distill the student model

- TinyBERT$_4$, TinyBERT$_6$

bash finetune.sh

The checkpoints will be saved into ./exp/downstream/tiny_bert/.

- EfficientBERT$_{\rm TINY}$, EfficientBERT, EfficientBERT+, EfficientBERT++

bash nas_finetune.sh

The above script will first pre-train the student models based on the pre-trained checkpoint of the supernet in stage 3, and save the pre-trained checkpoints into ./exp/pretrain/auto_bert/. Then fine-tune it on the downstream datasets, and save the fine-tuned checkpoints into ./exp/downstream/auto_bert/.

We also provide the pre-trained checkpoints of the student models (including EfficientBERT$_{\rm TINY}$, EfficientBERT, and EfficientBERT++) at HERE.

- EfficientBERT (TinyBERT$_6$)

bash nas_finetune_transfer.sh

The pre-trained and fine-tuned checkpoints will be saved into ./exp/pretrain/auto_tiny_bert/ and ./exp/downstream/auto_tiny_bert/, respectively.

Test on the GLUE dataset

bash test.sh

The test results will be saved into ./test_results/.

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021efficient-bert,
  title     = {{E}fficient{BERT}: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation},
  author    = {Chenhe Dong and Guangrun Wang and Hang Xu and Jiefeng Peng and Xiaozhe Ren and Xiaodan Liang},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
  year      = {2021}
}

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Related tags

Overview

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Requirements

Download checkpoints

Prepare dataset

- Pre-training data

- Fine-tuning data

Pre-train the supernet

Train the teacher model (BERT$_{\rm BASE}$)

Conduct NAS (including search stage 1, 2, and 3)

Distill the student model

- TinyBERT$_4$, TinyBERT$_6$

- EfficientBERT$_{\rm TINY}$, EfficientBERT, EfficientBERT+, EfficientBERT++

- EfficientBERT (TinyBERT$_6$)

Test on the GLUE dataset

Reference

Owner

Chenhe Dong

Reproduction process of BERT on SST2 dataset

NLP library designed for reproducible experimentation management

MicBot - MicBot uses Google Translate to speak everyone's chat messages

Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

Pretrained Japanese BERT models

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

Installation, test and evaluation of Scribosermo speech-to-text engine

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

LSTM model - IMDB review sentiment analysis

A fast, efficient universal vector embedding utility package.

End-to-end MLOps pipeline of a BERT model for emotion classification.

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Machine learning models from Singapore's NLP research community

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Write Alphabet, Words and Sentences with your eyes.