The official implementation for "FQ-ViT: Fully Quantized Vision Transformer without Retraining".

Overview

FQ-ViT [arXiv]

This repo contains the official implementation of "FQ-ViT: Fully Quantized Vision Transformer without Retraining".

Table of Contents

Introduction

Transformer-based architectures have achieved competitive performance in various CV tasks. Compared to the CNNs, Transformers usually have more parameters and higher computational costs, presenting a challenge when deployed to resource-constrained hardware devices.

Most existing quantization approaches are designed and tested on CNNs and lack proper handling of Transformer-specific modules. Previous work found there would be significant accuracy degradation when quantizing LayerNorm and Softmax of Transformer-based architectures. As a result, they left LayerNorm and Softmax unquantized with floating-point numbers. We revisit these two exclusive modules of the Vision Transformers and discover the reasons for degradation. In this work, we propose the FQ-ViT, the first fully quantized Vision Transformer, which contains two specific modules: Powers-of-Two Scale (PTS) and Log-Int-Softmax (LIS).

Layernorm quantized with Powers-of-Two Scale (PTS)

These two figures below show that there exists serious inter-channel variation in Vision Transformers than CNNs, which leads to unacceptable quantization errors with layer-wise quantization.

Taking the advantages of both layer-wise and channel-wise quantization, we propose PTS for LayerNorm's quantization. The core idea of PTS is to equip different channels with different Powers-of-Two Scale factors, rather than different quantization scales.

Softmax quantized with Log-Int-Softmax (LIS)

The storage and computation of attention map is known as a bottleneck for transformer structures, so we want to quantize it to extreme lower bit-width (e.g. 4-bit). However, if directly implementing 4-bit uniform quantization, there will be severe accuracy degeneration. We observe a distribution centering at a fairly small value of the output of Softmax, while only few outliers have larger values close to 1. Based on the following visualization, Log2 preserves more quantization bins than uniform for the small value interval with dense distribution.

Combining Log2 quantization with i-exp, which is a polynomial approximation of exponential function presented by I-BERT, we propose LIS, an integer-only, faster, low consuming Softmax.

The whole process is visualized as follow.

Getting Started

Install

  • Clone this repo.
git clone https://github.com/linyang-zhh/FQ-ViT.git
cd FQ-ViT
  • Create a conda virtual environment and activate it.
conda create -n fq-vit python=3.7 -y
conda activate fq-vit
  • Install PyTorch and torchvision. e.g.,
conda install pytorch=1.7.1 torchvision cudatoolkit=10.1 -c pytorch

Data preparation

You should download the standard ImageNet Dataset.

├── imagenet
│   ├── train
|
│   ├── val

Run

Example: Evaluate quantized DeiT-S with MinMax quantizer and our proposed PTS and LIS

python test_quant.py deit_small <YOUR_DATA_DIR> --quant --pts --lis --quant-method minmax
  • deit_small: model architecture, which can be replaced by deit_tiny, deit_base, vit_base, vit_large, swin_tiny, swin_small and swin_base.

  • --quant: whether to quantize the model.

  • --pts: whether to use Power-of-Two Scale Integer Layernorm.

  • --lis: whether to use Log-Integer-Softmax.

  • --quant-method: quantization methods of activations, which can be chosen from minmax, ema, percentile and omse.

Results on ImageNet

This paper employs several current post-training quantization strategies together with our methods, including MinMax, EMA , Percentile and OMSE.

  • MinMax uses the minimum and maximum values of the total data as the clipping values;

  • EMA is based on MinMax and uses an average moving mechanism to smooth the minimum and maximum values of different mini-batch;

  • Percentile assumes that the distribution of values conforms to a normal distribution and uses the percentile to clip. In this paper, we use the 1e-5 percentile because the 1e-4 commonly used in CNNs has poor performance in Vision Transformers.

  • OMSE determines the clipping values by minimizing the quantization error.

The following results are evaluated on ImageNet.

Method W/A/Attn Bits ViT-B ViT-L DeiT-T DeiT-S DeiT-B Swin-T Swin-S Swin-B
Full Precision 32/32/32 84.53 85.81 72.21 79.85 81.85 81.35 83.20 83.60
MinMax 8/8/8 23.64 3.37 70.94 75.05 78.02 64.38 74.37 25.58
MinMax w/ PTS 8/8/8 83.31 85.03 71.61 79.17 81.20 80.51 82.71 82.97
MinMax w/ PTS, LIS 8/8/4 82.68 84.89 71.07 78.40 80.85 80.04 82.47 82.38
EMA 8/8/8 30.30 3.53 71.17 75.71 78.82 70.81 75.05 28.00
EMA w/ PTS 8/8/8 83.49 85.10 71.66 79.09 81.43 80.52 82.81 83.01
EMA w/ PTS, LIS 8/8/4 82.57 85.08 70.91 78.53 80.90 80.02 82.56 82.43
Percentile 8/8/8 46.69 5.85 71.47 76.57 78.37 78.78 78.12 40.93
Percentile w/ PTS 8/8/8 80.86 85.24 71.74 78.99 80.30 80.80 82.85 83.10
Percentile w/ PTS, LIS 8/8/4 80.22 85.17 71.23 78.30 80.02 80.46 82.67 82.79
OMSE 8/8/8 73.39 11.32 71.30 75.03 79.57 79.30 78.96 48.55
OMSE w/ PTS 8/8/8 82.73 85.27 71.64 78.96 81.25 80.64 82.87 83.07
OMSE w/ PTS, LIS 8/8/4 82.37 85.16 70.87 78.42 80.90 80.41 82.57 82.45

Citation

If you find this repo useful in your research, please consider citing the following paper:

@misc{
    lin2021fqvit,
    title={FQ-ViT: Fully Quantized Vision Transformer without Retraining}, 
    author={Yang Lin and Tianyu Zhang and Peiqin Sun and Zheng Li and Shuchang Zhou},
    year={2021},
    eprint={2111.13824},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
It is the assignment for COMP 576 in Rice University

COMP-576 It is the assignment for COMP 576 in Rice University There are two programming assignments and one Final Project. Assignment 1: It is a MLP a

Maojie Tang 1 Nov 25, 2021
A synthetic texture-invariant dataset for object detection of UAVs

A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas

LARICS Lab 10 Aug 13, 2022
Proto-RL: Reinforcement Learning with Prototypical Representations

Proto-RL: Reinforcement Learning with Prototypical Representations This is a PyTorch implementation of Proto-RL from Reinforcement Learning with Proto

Denis Yarats 74 Dec 06, 2022
NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch

PyTorch implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping Paper: https://arxiv.org/abs/2102.06171.pdf Original code: htt

Vaibhav Balloli 320 Jan 02, 2023
PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention"

PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention" to appear in ICCV 2021

Kamal Gupta 75 Dec 23, 2022
Social Fabric: Tubelet Compositions for Video Relation Detection

Social-Fabric Social Fabric: Tubelet Compositions for Video Relation Detection This repository contains the code and results for the following paper:

Shuo Chen 7 Aug 09, 2022
Blender Python - Node-based multi-line text and image flowchart

MindMapper v0.8 Node-based text and image flowchart for Blender Mindmap with shortcuts visible: Mindmap with shortcuts hidden: Notes This was requeste

SpectralVectors 58 Oct 08, 2022
Wenet STT Python

Wenet STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using WeNet models for sp

David Zurow 33 Feb 21, 2022
Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

AHDRNet-PyTorch This is the PyTorch implementation of Attention-guided Network for Ghost-free High Dynamic Range Imaging (CVPR 2019). The official cod

Yutong Zhang 4 Sep 08, 2022
CURL: Contrastive Unsupervised Representations for Reinforcement Learning

CURL Rainbow Status: Archive (code is provided as-is, no updates expected) This is an implementation of CURL: Contrastive Unsupervised Representations

Aravind Srinivas 46 Dec 12, 2022
Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs This repository contains PyTorch implementation of our pa

Shizhe Chen 178 Dec 29, 2022
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Junyong Lee 173 Dec 30, 2022
PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

GRACE The official PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE). For a thorough resource collection of self-superv

Big Data and Multi-modal Computing Group, CRIPAC 186 Dec 27, 2022
Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

47 Oct 11, 2022
STMTrack: Template-free Visual Tracking with Space-time Memory Networks

STMTrack This is the official implementation of the paper: STMTrack: Template-free Visual Tracking with Space-time Memory Networks. Setup Prepare Anac

Zhihong Fu 62 Dec 21, 2022
Assginment for UofT CSC420: Intro to Image Understanding

Run the code Open edge_detection.ipynb in google colab. Upload image1.jpg,image2.jpg and my_image.jpg to '/content/drive/My Drive'. chooose 'Run all'

Ziyi-Zhou 1 Feb 24, 2022
⚾🤖⚾ Automatic baseball pitching overlay in realtime

⚾ Automatically overlaying pitch motion and trajectory with machine learning! This project takes your baseball pitching clips and automatically genera

Tony Chou 240 Dec 05, 2022
An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

gym-idsgame An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym gym-idsgame is a reinforcement learning environment for simulating at

Kim Hammar 29 Dec 03, 2022
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 36 Oct 31, 2022
Probabilistic Programming and Statistical Inference in PyTorch

PtStat Probabilistic Programming and Statistical Inference in PyTorch. Introduction This project is being developed during my time at Cogent Labs. The

Stefano Peluchetti 109 Nov 26, 2022