Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Overview

Structured Super Lottery Tickets in BERT

This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization" (ACL 2021).


Getting Start

  1. python3.6
    Reference to download and install : https://www.python.org/downloads/release/python-360/
  2. install requirements
    > pip install -r requirements.txt

Data

  1. Download data
    sh download.sh
    Please refer to download GLUE dataset: https://gluebenchmark.com/
  2. Preprocess data
    > sh experiments/glue/prepro.sh
    For more data processing details, please refer to this repo.

Verifying Phase Transition Phenomenon

  1. Fine-tune a pre-trained BERT model with single task data, compute importance scores, and generate one-shot structured pruning masks at multiple sparsity levels. E.g., for MNLI, run

    ./scripts/train_mnli.sh GPUID
    
  2. Rewind and evaluate the winning, random, and losing tickets at multiple sparsity levels. E.g., for MNLI, run

    ./scripts/rewind_mnli.sh GPUID
    

You may try tasks with smaller sizes (e.g., SST, MRPC, RTE) to see a more pronounced phase transition.


Multi-task Learning (MTL) with Tickets Sharing

  1. Identify a set of super tickets for each individual task.

    • Identify winning tickets at multiple sparsity levels for each individual task. E.g., for MTDNN-base, run

      ./scripts/prepare_mtdnn_base.sh GPUID
      

      We recommend to use the same optimization settings, e.g., learning rate, optimizer and random seed, in both the ticket identification procedures and the MTL. We empirically observe that the super tickets perform better in MTL in such a case.

    • [Optional] For each individual task, identify a set of super tickets from the winning tickets at multiple sparsity levels. You can skip this step if you wish to directly use the set of super tickets identified by us. If you wish to identify super tickets on your own (This is recommended if you use a different optimization settings, e.g., learning rate, optimizer and random seed, from those in our scripts. These factors may affect the candidacy of super tickets.), we provide the template scripts

      ./scripts/rewind_mnli_winning.sh GPUID
      ./scripts/rewind_qnli_winning.sh GPUID
      ./scripts/rewind_qqp_winning.sh GPUID
      ./scripts/rewind_sst_winning.sh GPUID
      ./scripts/rewind_mrpc_winning.sh GPUID
      ./scripts/rewind_cola_winning.sh GPUID
      ./scripts/rewind_stsb_winning.sh GPUID
      ./scripts/rewind_rte_winning.sh GPUID
      

      These scripts rewind the winning tickets at multiple sparsity levels. You can manually identify the set of super tickets as the set of winning tickets that perform the best among all sparsity levels.

  2. Construct multi-task super tickets by aggregating the identified sets of super tickets of all tasks. E.g., to use the super tickets identified by us, run

    python construct_mtl_mask.py
    

    You can modify the script to use the super tickets identified by yourself.

  3. MTL with tickets sharing. Run

    ./scripts/train_mtdnn.sh GPUID
    

MTL Benchmark

MTL evaluation results on GLUE dev set averaged over 5 random seeds.

Model MNLI-m/mm (Acc) QNLI (Acc) QQP (Acc/F1) SST-2 (Acc) MRPC (Acc/F1) CoLA (Mcc) STS-B (P/S) RTE (Acc) Avg Score Avg Compression
MTDNN, base 84.6/84.2 90.5 90.6/87.4 92.2 80.6/86.2 54.0 86.2/86.4 79.0 82.4 100%
Tickets-Share, base 84.5/84.1 91.0 90.7/87.5 92.7 87.0/90.5 52.0 87.7/87.5 81.2 83.3 92.9%
MTDNN, large 86.5/86.0 92.2 91.2/88.1 93.5 85.2/89.4 56.2 87.2/86.9 83.0 84.4 100%
Tickets-Share, large 86.7/86.0 92.1 91.3/88.4 93.2 88.4/91.5 61.8 89.2/89.1 80.5 85.4 83.3%

Citation

@article{liang2021super,
  title={Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization},
  author={Liang, Chen and Zuo, Simiao and Chen, Minshuo and Jiang, Haoming and Liu, Xiaodong and He, Pengcheng and Zhao, Tuo and Chen, Weizhu},
  journal={arXiv preprint arXiv:2105.12002},
  year={2021}
}

@article{liu2020mtmtdnn,
  title={The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and Wang, Yu and Ji, Jianshu and Cheng, Hao and Zhu, Xueyun and Awa, Emmanuel and He, Pengcheng and Chen, Weizhu and Poon, Hoifung and Cao, Guihong and Jianfeng Gao},
  journal={arXiv preprint arXiv:2002.07972},
  year={2020}
}

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang ([email protected]).

Owner
Chen Liang
Chen Liang
A raytrace framework using taichi language

ti-raytrace The code use Taichi programming language Current implement acceleration lvbh disney brdf How to run First config your anaconda workspace,

蕉太狼 73 Dec 11, 2022
A minimal Conformer ASR implementation adapted from ESPnet.

Conformer ASR A minimal Conformer ASR implementation adapted from ESPnet. Introduction I want to use the pre-trained English ASR model provided by ESP

Niu Zhe 3 Jan 24, 2022
Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 08, 2022
Production First and Production Ready End-to-End Keyword Spotting Toolkit

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023
Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

Guanyi Mou 2 Dec 27, 2022
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipB

Jie Lei 雷杰 612 Jan 04, 2023
Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

Meta Research 578 Dec 13, 2022
AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

Microsoft 37 Nov 29, 2022
This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

shivam danawale 5 Jul 17, 2022
Plugin repository for Macast

Macast-plugins Plugin repository for Macast. How to use third-party player plugin Download Macast from GitHub Release. Download the plugin you want fr

109 Jan 04, 2023
This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a

7 Sep 28, 2021
Auto-researching tool generating word documents.

About ResearchTE automates researching by generating document with answers to given questions. Supports getting results from: Google DuckDuckGo (with

1 Feb 14, 2022
Various capabilities for static malware analysis.

Malchive The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to da

MITRE Cybersecurity 64 Nov 22, 2022
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Sinkhorn Transformer This is a reproduction of the work outlined in Sparse Sinkhorn Attention, with additional enhancements. It includes a parameteriz

Phil Wang 217 Nov 25, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

A Infomation Grathering tool that reverse search phone numbers and get their details ! What is phomber? Phomber is one of the best tools available fo

S41R4J 121 Dec 27, 2022
Indonesia spellchecker with python

indonesia-spellchecker Ganti kata yang terdapat pada file teks.txt untuk diperiksa kebenaran kata. Run on local machine python3 main.py

Rahmat Agung Julians 1 Sep 14, 2022
A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
chaii - hindi & tamil question answering

chaii - hindi & tamil question answering This is the solution for rank 5th in Kaggle competition: chaii - Hindi and Tamil Question Answering. The comp

abhishek thakur 33 Dec 18, 2022