Must-read papers on improving efficiency for pre-trained language models.

Overview

Awesome Efficient PLM Papers

Must-read papers on improving efficiency for pre-trained language models.

The paper list is mainly mantained by Lei Li and Shuhuai Ren.

Knowledge Distillation

  1. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter NeurIPS workshop

    Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf [pdf] [project]

  2. Patient Knowledge Distillation for BERT Model Compression EMNLP 2019

    Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu [pdf] [project]

  3. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Preprint

    Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova [pdf] [project]

  4. TinyBERT: Distilling BERT for Natural Language Understanding Findings of EMNLP 2020

    Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu [pdf] [project]

  5. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing EMNLP 2020

    Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou [pdf] [project]

  6. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers NeurIPS 2020

    Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou [pdf] [project]

  7. BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance EMNLP 2020

    Jianquan Li, Xiaokang Liu, Honghong Zhao, Ruifeng Xu, Min Yang, Yaohong Jin [pdf] [project]

  8. MixKD: Towards Efficient Distillation of Large-scale Language Models ICLR 2021

    Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin [pdf]

  9. Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains ACL-IJCNLP 2021

    Haojie Pan, Chengyu Wang, Minghui Qiu, Yichang Zhang, Yaliang Li, Jun Huang [pdf]

  10. MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation ACL-IJCNLP 2021

    Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh [pdf]

  11. Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor ACL-IJCNLP 2021

    Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu [pdf] [project]

  12. Weight Distillation: Transferring the Knowledge in Neural Network Parameters ACL-IJCNLP 2021

    Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu [pdf]

  13. Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation ACL-IJCNLP 2021

    Yuanxin Liu, Fandong Meng, Zheng Lin, Weiping Wang, Jie Zhou [pdf]

  14. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Findings of ACL-IJCNLP 2021

    Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei [pdf] [project]

  15. One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers Findings of ACL-IJCNLP 2021

    Chuhan Wu, Fangzhao Wu, Yongfeng Huang [pdf]

Dynamic Early Exiting

  1. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference ACL 2020

    Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin [pdf] [project]

  2. FastBERT: a Self-distilling BERT with Adaptive Inference Time ACL 2020

    Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju [pdf] [project]

  3. The Right Tool for the Job: Matching Model and Instance Complexities ACL 2020

    Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith [pdf] [project]

  4. A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models NAACL 2021

    Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu Sun, Bin He [pdf] [project]

  5. CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade Preprint

    Lei Li, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun [pdf] [project]

  6. Early Exiting BERT for Efficient Document Ranking SustaiNLP 2020

    Ji Xin, Rodrigo Nogueira, Yaoliang Yu, and Jimmy Lin [pdf] [project]

  7. BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression EACL 2021

    Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin [pdf] [project]

  8. Accelerating BERT Inference for Sequence Labeling via Early-Exit ACL 2021

    Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang [pdf] [project]

  9. BERT Loses Patience: Fast and Robust Inference with Early Exit NeurIPS 2020

    Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei [pdf] [project]

  10. Early Exiting with Ensemble Internal Classifiers Preprint

    Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu [pdf]

Quantization

  1. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT AAAI 2020

    Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer [pdf] [project]

  2. TernaryBERT: Distillation-aware Ultra-low Bit BERT EMNLP 2020

    Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu [pdf] [project]

  3. Q8BERT: Quantized 8Bit BERT NeurIPS 2019 Workshop

    Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat [pdf] [project]

  4. BinaryBERT: Pushing the Limit of BERT Quantization EMNLP 2020

    Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King [pdf] [project]

  5. I-BERT: Integer-only BERT Quantization ICML 2021

    Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer [pdf] [project]

Pruning

  1. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned ACL 2019

    Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov [pdf] [project]

  2. Are Sixteen Heads Really Better than One? NeurIPS 2019

    Paul Michel, Omer Levy, Graham Neubig [pdf] [project]

  3. The Lottery Ticket Hypothesis for Pre-trained BERT Networks NeurIPS 2020

    Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, Michael Carbin [pdf] [project]

  4. Movement Pruning: Adaptive Sparsity by Fine-Tuning NeurIPS 2020

    Victor Sanh, Thomas Wolf, Alexander M. Rush [pdf] [project]

  5. Reducing Transformer Depth on Demand with Structured Dropout Preprint

    Angela Fan, Edouard Grave, Armand Joulin [pdf]

  6. When BERT Plays the Lottery, All Tickets Are Winning EMNLP 2020

    Sai Prasanna, Anna Rogers, Anna Rumshisky [pdf] [project]

  7. Structured Pruning of a BERT-based Question Answering Model Preprint

    J.S. McCarley, Rishav Chakravarti, Avirup Sil [pdf]

  8. Structured Pruning of Large Language Models EMNLP 2020

    Ziheng Wang, Jeremy Wohlwend, Tao Lei [pdf] [project]

  9. Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm NAACL 2021

    Dongkuan Xu, Ian E.H. Yen, Jinxi Zhao, Zhibin Xiao [pdf]

  10. Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization ACL 2021

    Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen [pdf] [project]

Contribution

If you find any related work not included in the list, do not hesitate to raise a PR to help us complete the list.

Owner
Tobias Lee
On the way becoming an NLPer.
Tobias Lee
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

Jangwon Park 183 Dec 14, 2022
This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

Koga Kobayashi 60 Nov 11, 2022
Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Hao Zhu 2 Sep 27, 2022
Unlimited Call - Text Bombing Tool

FastBomber Unlimited Call - Text Bombing Tool Installation On Termux

Aryan 6 Nov 10, 2022
ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

ThinkTwice ThinkTwice is a retriever-reader architecture for solving long-text machine reading comprehension. It is based on the paper: ThinkTwice: A

Walle 4 Aug 06, 2021
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

EasyTransfer is designed to make the development of transfer learning in NLP applications easier. The literature has witnessed the success of applying

Alibaba 819 Jan 03, 2023
Shared, streaming Python dict

UltraDict Sychronized, streaming Python dictionary that uses shared memory as a backend Warning: This is an early hack. There are only few unit tests

Ronny Rentner 192 Dec 23, 2022
Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

youtube-dl and ffmpeg Windows Explorer Integration Download videos from YouTube/Twitch/Twitter and more (any platform that is supported by youtube-dl)

Wolfgang 226 Dec 30, 2022
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet 🐦 🇮🇩 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

IndoLEM 40 Nov 30, 2022
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

Maksim Terpilowski 49 Dec 30, 2022
A curated list of efficient attention modules

awesome-fast-attention A curated list of efficient attention modules

Sepehr Sameni 891 Dec 22, 2022
A telegram bot to translate 100+ Languages

🔥 GOOGLE TRANSLATER 🔥 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🔰 Deploy To Railway 🔰 • • ✅ OFF

Aɴᴋɪᴛ Kᴜᴍᴀʀ 5 Dec 20, 2021
Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

Andrea Cavallo 3 Jun 22, 2022
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

Jeff Johannsen 3 Nov 27, 2022
Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Yu Zhang 50 Nov 08, 2022
Build Text Rerankers with Deep Language Models

Reranker is a lightweight, effective and efficient package for training and deploying deep languge model reranker in information retrieval (IR), question answering (QA) and many other natural languag

Luyu Gao 140 Dec 06, 2022
Image2pcl - Enter the metaverse with 2D image to 3D projections

Image2PCL Enter the metaverse with 2D image to 3D projections! This is an implem

Benjamin Ho 0 Feb 05, 2022
MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-tr

Microsoft 228 Nov 21, 2022