This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Last update: Nov 15, 2022

Related tags

Overview

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

It includes /bert, which is the original BERT repository modified to be weight prunable. (And to use gradient checkpointing, if you need that. This can be disabled by setting a unix environment variable DISABLE_GRAD_CHECKPOINT=True. This only works during fine-tuning, not during pre-training.)

I am currently in the process of converting these experiments into a ducttape workflow, so things are a little unstable right now.

Things that have not been converted to ducttape:

Anything in tables/
Anything in graphs/

If you need all the experiments from the paper, check out this commit. It's very messy, so be prepared to read the code. I will not be releasing a guide to run that code, since it will be made obselete by the ducttape workflow.

Configuration

pip install -r requirements.txt

To pre-train, you will need a GPU with at least 12 GB of GPU RAM. I've been using Titan RTX's via Univa Grid Engine. If you don't like this setup, you will need to modify tapes/submitters.tape and/or main.tconf.

You'll also need the Wikipedia corpus and BookCorpus, which can be retrieved with scripts/download_wiki.sh or scripts/download_bookcorpus.sh, respectively. GLUE data can be retrieved by running scripts/get_glue.py.

You will need to update tapes/link_data.tape to point to dataset locations.

You will also need to update main.tconf to point to the location of your repository on disk (so ducttape knows where to find packages).

AFAIK, no one besides me has used this code. If you have trouble, please open an issue and I'll do what I can to help out.

Most experiments are run using

ducttape main.tape -C main.tconf -p main

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Related tags

Overview

Configuration

Owner

Mitchell Gordon

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Bio-OFC gym implementation and Gym-Fly environment

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch

DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

A Topic Modeling toolbox

Arabic Car License Recognition. A solution to the kaggle competition Machathon 3.0.

Async API for controlling Hue Lights

Official Pytorch implementation for video neural representation (NeRV)

[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

African language Speech Recognition - Speech-to-Text

TrackFormer: Multi-Object Tracking with Transformers

Fully convolutional deep neural network to remove transparent overlays from images

Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Code release for SLIP Self-supervision meets Language-Image Pre-training

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning