This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Last update: Nov 15, 2022

Related tags

Overview

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

It includes /bert, which is the original BERT repository modified to be weight prunable. (And to use gradient checkpointing, if you need that. This can be disabled by setting a unix environment variable DISABLE_GRAD_CHECKPOINT=True. This only works during fine-tuning, not during pre-training.)

I am currently in the process of converting these experiments into a ducttape workflow, so things are a little unstable right now.

Things that have not been converted to ducttape:

Anything in tables/
Anything in graphs/

If you need all the experiments from the paper, check out this commit. It's very messy, so be prepared to read the code. I will not be releasing a guide to run that code, since it will be made obselete by the ducttape workflow.

Configuration

pip install -r requirements.txt

To pre-train, you will need a GPU with at least 12 GB of GPU RAM. I've been using Titan RTX's via Univa Grid Engine. If you don't like this setup, you will need to modify tapes/submitters.tape and/or main.tconf.

You'll also need the Wikipedia corpus and BookCorpus, which can be retrieved with scripts/download_wiki.sh or scripts/download_bookcorpus.sh, respectively. GLUE data can be retrieved by running scripts/get_glue.py.

You will need to update tapes/link_data.tape to point to dataset locations.

You will also need to update main.tconf to point to the location of your repository on disk (so ducttape knows where to find packages).

AFAIK, no one besides me has used this code. If you have trouble, please open an issue and I'll do what I can to help out.

Most experiments are run using

ducttape main.tape -C main.tconf -p main

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Related tags

Overview

Configuration

Owner

Mitchell Gordon

Multi-Task Deep Neural Networks for Natural Language Understanding

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

Pytorch Lightning code guideline for conferences

AutoVideo: An Automated Video Action Recognition System

Conformer: Local Features Coupling Global Representations for Visual Recognition

VGGFace2-HQ - A high resolution face dataset for face editing purpose

Code for Learning to Segment The Tail (LST)

Preprossing-loan-data-with-NumPy - In this project, I have cleaned and pre-processed the loan data that belongs to an affiliate bank based in the United States.

Unofficial Implementation of Oboe (SIGCOMM'18').

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Explanatory Learning: Beyond Empiricism in Neural Networks

Instance-conditional Knowledge Distillation for Object Detection

Automated Hyperparameter Optimization Competition

Bayesian regularization for functional graphical models.

Adversarial Autoencoders

Adaptive FNO transformer - official Pytorch implementation

PyTorch implementation of "Continual Learning with Deep Generative Replay", NIPS 2017

This is a model made out of Neural Network specifically a Convolutional Neural Network model

Additional code for Stable-baselines3 to load and upload models from the Hub.

Time Delayed NN implemented in pytorch