Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Last update: Dec 11, 2022

Overview

Structured Super Lottery Tickets in BERT

This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization" (ACL 2021).

Getting Start

python3.6
Reference to download and install : https://www.python.org/downloads/release/python-360/
install requirements
> pip install -r requirements.txt

Data

Download data
sh download.sh
Please refer to download GLUE dataset: https://gluebenchmark.com/
Preprocess data
> sh experiments/glue/prepro.sh
For more data processing details, please refer to this repo.

Verifying Phase Transition Phenomenon

Fine-tune a pre-trained BERT model with single task data, compute importance scores, and generate one-shot structured pruning masks at multiple sparsity levels. E.g., for MNLI, run
```
./scripts/train_mnli.sh GPUID
```
Rewind and evaluate the winning, random, and losing tickets at multiple sparsity levels. E.g., for MNLI, run
```
./scripts/rewind_mnli.sh GPUID
```

You may try tasks with smaller sizes (e.g., SST, MRPC, RTE) to see a more pronounced phase transition.

Multi-task Learning (MTL) with Tickets Sharing

Identify a set of super tickets for each individual task.
- Identify winning tickets at multiple sparsity levels for each individual task. E.g., for MTDNN-base, run
```
./scripts/prepare_mtdnn_base.sh GPUID
```
  We recommend to use the same optimization settings, e.g., learning rate, optimizer and random seed, in both the ticket identification procedures and the MTL. We empirically observe that the super tickets perform better in MTL in such a case.
- [Optional] For each individual task, identify a set of super tickets from the winning tickets at multiple sparsity levels. You can skip this step if you wish to directly use the set of super tickets identified by us. If you wish to identify super tickets on your own (This is recommended if you use a different optimization settings, e.g., learning rate, optimizer and random seed, from those in our scripts. These factors may affect the candidacy of super tickets.), we provide the template scripts
```
./scripts/rewind_mnli_winning.sh GPUID
./scripts/rewind_qnli_winning.sh GPUID
./scripts/rewind_qqp_winning.sh GPUID
./scripts/rewind_sst_winning.sh GPUID
./scripts/rewind_mrpc_winning.sh GPUID
./scripts/rewind_cola_winning.sh GPUID
./scripts/rewind_stsb_winning.sh GPUID
./scripts/rewind_rte_winning.sh GPUID
```
  These scripts rewind the winning tickets at multiple sparsity levels. You can manually identify the set of super tickets as the set of winning tickets that perform the best among all sparsity levels.
Construct multi-task super tickets by aggregating the identified sets of super tickets of all tasks. E.g., to use the super tickets identified by us, run
```
python construct_mtl_mask.py
```
You can modify the script to use the super tickets identified by yourself.
MTL with tickets sharing. Run
```
./scripts/train_mtdnn.sh GPUID
```

MTL Benchmark

MTL evaluation results on GLUE dev set averaged over 5 random seeds.

Model	MNLI-m/mm (Acc)	QNLI (Acc)	QQP (Acc/F1)	SST-2 (Acc)	MRPC (Acc/F1)	CoLA (Mcc)	STS-B (P/S)	RTE (Acc)	Avg Score	Avg Compression
MTDNN, base	84.6/84.2	90.5	90.6/87.4	92.2	80.6/86.2	54.0	86.2/86.4	79.0	82.4	100%
Tickets-Share, base	84.5/84.1	91.0	90.7/87.5	92.7	87.0/90.5	52.0	87.7/87.5	81.2	83.3	92.9%
MTDNN, large	86.5/86.0	92.2	91.2/88.1	93.5	85.2/89.4	56.2	87.2/86.9	83.0	84.4	100%
Tickets-Share, large	86.7/86.0	92.1	91.3/88.4	93.2	88.4/91.5	61.8	89.2/89.1	80.5	85.4	83.3%

Citation

@article{liang2021super,
  title={Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization},
  author={Liang, Chen and Zuo, Simiao and Chen, Minshuo and Jiang, Haoming and Liu, Xiaodong and He, Pengcheng and Zhao, Tuo and Chen, Weizhu},
  journal={arXiv preprint arXiv:2105.12002},
  year={2021}
}

@article{liu2020mtmtdnn,
  title={The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and Wang, Yu and Ji, Jianshu and Cheng, Hao and Zhu, Xueyun and Awa, Emmanuel and He, Pengcheng and Chen, Weizhu and Poon, Hoifung and Cao, Guihong and Jianfeng Gao},
  journal={arXiv preprint arXiv:2002.07972},
  year={2020}
}

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang ([email protected]).

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Related tags

Overview

Structured Super Lottery Tickets in BERT

Getting Start

Data

Verifying Phase Transition Phenomenon

Multi-task Learning (MTL) with Tickets Sharing

MTL Benchmark

Citation

Contact Information

Owner

Chen Liang

Spert NLP Relation Extraction API deployed with torchserve for inference

Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

Contains links to publicly available datasets for modeling health outcomes using speech and language.

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Code Generation using a large neural network called GPT-J

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Automated question generation and question answering from Turkish texts using text-to-text transformers

A PyTorch implementation of VIOLET

hashily is a Python module that provides a variety of text decoding and encoding operations.

Code for lyric-section-to-comment generation based on huggingface transformers.

Chinese Grammatical Error Diagnosis

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.