Awesome Efficient PLM Papers

Must-read papers on improving efficiency for pre-trained language models.

The paper list is mainly mantained by Lei Li and Shuhuai Ren.

Knowledge Distillation

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter NeurIPS workshop

Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf [pdf] [project]
Patient Knowledge Distillation for BERT Model Compression EMNLP 2019

Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu [pdf] [project]
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Preprint

Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova [pdf] [project]
TinyBERT: Distilling BERT for Natural Language Understanding Findings of EMNLP 2020

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu [pdf] [project]
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing EMNLP 2020

Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou [pdf] [project]
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers NeurIPS 2020

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou [pdf] [project]
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance EMNLP 2020

Jianquan Li, Xiaokang Liu, Honghong Zhao, Ruifeng Xu, Min Yang, Yaohong Jin [pdf] [project]
MixKD: Towards Efficient Distillation of Large-scale Language Models ICLR 2021

Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin [pdf]
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains ACL-IJCNLP 2021

Haojie Pan, Chengyu Wang, Minghui Qiu, Yichang Zhang, Yaliang Li, Jun Huang [pdf]
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation ACL-IJCNLP 2021

Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh [pdf]
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor ACL-IJCNLP 2021

Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu [pdf] [project]
Weight Distillation: Transferring the Knowledge in Neural Network Parameters ACL-IJCNLP 2021

Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu [pdf]
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation ACL-IJCNLP 2021

Yuanxin Liu, Fandong Meng, Zheng Lin, Weiping Wang, Jie Zhou [pdf]
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Findings of ACL-IJCNLP 2021

Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei [pdf] [project]
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers Findings of ACL-IJCNLP 2021

Chuhan Wu, Fangzhao Wu, Yongfeng Huang [pdf]

Dynamic Early Exiting

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference ACL 2020

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin [pdf] [project]
FastBERT: a Self-distilling BERT with Adaptive Inference Time ACL 2020

Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju [pdf] [project]
The Right Tool for the Job: Matching Model and Instance Complexities ACL 2020

Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith [pdf] [project]
A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models NAACL 2021

Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu Sun, Bin He [pdf] [project]
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade Preprint

Lei Li, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun [pdf] [project]
Early Exiting BERT for Efficient Document Ranking SustaiNLP 2020

Ji Xin, Rodrigo Nogueira, Yaoliang Yu, and Jimmy Lin [pdf] [project]
BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression EACL 2021

Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin [pdf] [project]
Accelerating BERT Inference for Sequence Labeling via Early-Exit ACL 2021

Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang [pdf] [project]
BERT Loses Patience: Fast and Robust Inference with Early Exit NeurIPS 2020

Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei [pdf] [project]
Early Exiting with Ensemble Internal Classifiers Preprint

Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu [pdf]

Quantization

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT AAAI 2020

Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer [pdf] [project]
TernaryBERT: Distillation-aware Ultra-low Bit BERT EMNLP 2020

Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu [pdf] [project]
Q8BERT: Quantized 8Bit BERT NeurIPS 2019 Workshop

Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat [pdf] [project]
BinaryBERT: Pushing the Limit of BERT Quantization EMNLP 2020

Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King [pdf] [project]
I-BERT: Integer-only BERT Quantization ICML 2021

Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer [pdf] [project]

Pruning

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned ACL 2019

Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov [pdf] [project]
Are Sixteen Heads Really Better than One? NeurIPS 2019

Paul Michel, Omer Levy, Graham Neubig [pdf] [project]
The Lottery Ticket Hypothesis for Pre-trained BERT Networks NeurIPS 2020

Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, Michael Carbin [pdf] [project]
Movement Pruning: Adaptive Sparsity by Fine-Tuning NeurIPS 2020

Victor Sanh, Thomas Wolf, Alexander M. Rush [pdf] [project]
Reducing Transformer Depth on Demand with Structured Dropout Preprint

Angela Fan, Edouard Grave, Armand Joulin [pdf]
When BERT Plays the Lottery, All Tickets Are Winning EMNLP 2020

Sai Prasanna, Anna Rogers, Anna Rumshisky [pdf] [project]
Structured Pruning of a BERT-based Question Answering Model Preprint

J.S. McCarley, Rishav Chakravarti, Avirup Sil [pdf]
Structured Pruning of Large Language Models EMNLP 2020

Ziheng Wang, Jeremy Wohlwend, Tao Lei [pdf] [project]
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm NAACL 2021

Dongkuan Xu, Ian E.H. Yen, Jinxi Zhao, Zhibin Xiao [pdf]
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization ACL 2021

Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen [pdf] [project]

Contribution

If you find any related work not included in the list, do not hesitate to raise a PR to help us complete the list.

Must-read papers on improving efficiency for pre-trained language models.

Related tags

Overview

Awesome Efficient PLM Papers

Knowledge Distillation

Dynamic Early Exiting

Quantization

Pruning

Contribution

Owner

Tobias Lee

Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

ReCoin - Restoring our environment and businesses in parallel

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

Converts python code into c++ by using OpenAI CODEX.

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Generate text line images for training deep learning OCR model (e.g. CRNN)

Python package for performing Entity and Text Matching using Deep Learning.

This is a MD5 password/passphrase brute force tool

Open source annotation tool for machine learning practitioners.

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Sentence Embeddings with BERT & XLNet

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Yes it's true :broken_heart:

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Code for PED: DETR For (Crowd) Pedestrian Detection