I-BERT: Integer-only BERT Quantization

Overview

Screen Shot 2020-12-19 at 9 51 50 PM

I-BERT: Integer-only BERT Quantization

HuggingFace Implementation

I-BERT is also available in the master branch of HuggingFace! Visit the following links for the HuggingFace implementation.

Github Link: https://github.com/huggingface/transformers/tree/master/src/transformers/models/ibert

Model Links:

Installation & Requirements

You can find more detailed installation guides from the Fairseq repo: https://github.com/pytorch/fairseq

1. Fairseq Installation

Reference: Fairseq

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • Currently, I-BERT only supports training on GPU
git clone https://github.com/kssteven418/I-BERT.git
cd I-BERT
pip install --editable ./

2. Download pre-trained RoBERTa models

Reference: Fairseq RoBERTa

Download pretrained RoBERTa models from the links and unzip them.

# In I-BERT (root) directory
mkdir models && cd models
wget {link}
tar -xvf roberta.{base|large}.tar.gz

3. Download GLUE datasets

Reference: Fairseq Finetuning on GLUE

First, download the data from the GLUE website. Make sure to download the dataset in I-BERT (root) directory.

# In I-BERT (root) directory
wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
python download_glue_data.py --data_dir glue_data --tasks all

Then, preprocess the data.

# In I-BERT (root) directory
./examples/roberta/preprocess_GLUE_tasks.sh glue_data {task_name}

task_name can be one of the following: {ALL, QQP, MNLI, QNLI, MRPC, RTE, STS-B, SST-2, CoLA} . ALL will preprocess all the tasks. If the command is run propely, preprocessed datasets will be stored in I-BERT/{task_name}-bin

Now, you have the models and the datasets ready, so you are ready to run I-BERT!

Task-specific Model Finetuning

Before quantizing the model, you first have to finetune the pre-trained models to a specific downstream task. Although you can finetune the model from the original Fairseq repo, we provide ibert-base branch where you can train non-quantized models without having to install the original Fairseq. This branch is identical to the master branch of the original Fairseq repo, except for some loggings and run scripts that are irrelevant to the functionality. If you already have finetuned models, you can skip this part.

Run the following commands to fetch and move to the ibert-base branch:

# In I-BERT (root) directory
git fetch
git checkout -t origin/ibert-base

Then, run the script:

# In I-BERT (root) directory
# CUDA_VISIBLE_DEVICES={device} python run.py --arch {roberta_base|roberta_large} --task {task_name}
CUDA_VISIBLE_DEVICES=0 python run.py --arch roberta_base --task MRPC

Checkpoints and validation logs will be stored at ./outputs directory. You can change this output location by adding the option --output-dir OUTPUT_DIR. The exact output location will look something like: ./outputs/none/MRPC-base/wd0.1_ad0.1_d0.1_lr2e-5/1219-101427_ckpt/checkpoint_best.pt. By default, models are trained according to the task-specific hyperparameters specified in Fairseq Finetuning on GLUE. However, you can also specify the hyperparameters with the options (use the option -h for more details).

Quantiation & Quantization-Aware-Finetuning

Now, we come back to ibert branch for quantization.

git checkout ibert

And then run the script. This will first quantize the model and do quantization-aware-finetuning with the learning rate that you specify with the option --lr {lr}.

# In I-BERT (root) directory
# CUDA_VISIBLE_DEVICES={device} python run.py --arch {roberta_base|roberta_large} --task {task_name} \
# --restore-file {ckpt_path} --lr {lr}
CUDA_VISIBLE_DEVICES=0 python run.py --arch roberta_base --task MRPC --restore-file ckpt-best.pt --lr 1e-6

NOTE: Our work is still on progress. Currently, all integer operations are executed with floating point.

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

DLCF-DCA codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification. submitted t

15 Aug 30, 2022
TransGAN: Two Transformers Can Make One Strong GAN

[Preprint] "TransGAN: Two Transformers Can Make One Strong GAN", Yifan Jiang, Shiyu Chang, Zhangyang Wang

VITA 1.5k Jan 07, 2023
DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

DockStream Description DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution an

AstraZeneca - Molecular AI 72 Jan 02, 2023
Final report with code for KAIST Course KSE 801.

Orthogonal collocation is a method for the numerical solution of partial differential equations

Chuanbo HUA 4 Apr 06, 2022
Symbolic Music Generation with Diffusion Models

Symbolic Music Generation with Diffusion Models Supplementary code release for our work Symbolic Music Generation with Diffusion Models. Installation

Magenta 119 Jan 07, 2023
Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Christopher T. Chubb 35 Dec 21, 2022
A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

LegoNet This code is the implementation of ICML2019 paper LegoNet: Efficient Convolutional Neural Networks with Lego Filters Run python train.py You c

YangZhaohui 140 Sep 26, 2022
"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

undirected-generation-dev This repo contains the source code of the models described in the following paper "Learning and Analyzing Generation Order f

Yichen Jiang 0 Mar 25, 2022
Testing and Estimation of structural breaks in Stata

xtbreak estimating and testing for many known and unknown structural breaks in time series and panel data. For an overview of xtbreak test see xtbreak

Jan Ditzen 13 Jun 19, 2022
Evolution Strategies in PyTorch

Evolution Strategies This is a PyTorch implementation of Evolution Strategies. Requirements Python 3.5, PyTorch = 0.2.0, numpy, gym, universe, cv2 Wh

Andrew Gambardella 333 Nov 14, 2022
Multi-Scale Progressive Fusion Network for Single Image Deraining

Multi-Scale Progressive Fusion Network for Single Image Deraining (MSPFN) This is an implementation of the MSPFN model proposed in the paper (Multi-Sc

Kuijiang 128 Nov 21, 2022
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

NBFNet: Neural Bellman-Ford Networks This is the official codebase of the paper Neural Bellman-Ford Networks: A General Graph Neural Network Framework

MilaGraph 136 Dec 21, 2022
[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Incremental Object Detection via Meta-Learning To appear in an upcoming issue of the IEEE Transactions on Pattern Analysis and Machine Intelligence (T

Joseph K J 66 Jan 04, 2023
Implementation of UNet on the Joey ML framework

Independent Research Project - Code Joey can be cloned from here https://github.com/devitocodes/joey/. Devito and other dependencies such as PyTorch a

Navjot Kukreja 1 Oct 21, 2021
This is a file about Unet implemented in Pytorch

Unet this is an implemetion of Unet in Pytorch and it's architecture is as follows which is the same with paper of Unet component of Unet Convolution

Dragon 1 Dec 03, 2021
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
This program will stylize your photos with fast neural style transfer.

Neural Style Transfer (NST) Using TensorFlow Demo TensorFlow TensorFlow is an end-to-end open source platform for machine learning. It has a comprehen

Ismail Boularbah 1 Aug 08, 2022