Pytorch implementation of BRECQ, ICLR 2021

Last update: Dec 28, 2022

Related tags

Overview

BRECQ

Pytorch implementation of BRECQ, ICLR 2021

@inproceedings{
li&gong2021brecq,
title={BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction},
author={Yuhang Li and Ruihao Gong and Xu Tan and Yang Yang and Peng Hu and Qi Zhang and Fengwei Yu and Wei Wang and Shi Gu},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=POWv6hDd9XH}
}

Pretrained models

We provide all the pretrained models and they can be accessed via torch.hub

For example: use res18 = torch.hub.load('yhhhli/BRECQ', model='resnet18', pretrained=True) to get the pretrained ResNet-18 model.

If you encounter URLError when downloading the pretrained network, it's probably a network failure. An alternative way is to use wget to manually download the file, then move it to ~/.cache/torch/checkpoints, where the load_state_dict_from_url function will check before downloading it.

For example:

wget https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet50_imagenet.pth.tar 
mv resnet50_imagenet.pth.tar ~/.cache/torch/checkpoints

Usage

python main_imagenet.py --data_path PATN/TO/DATA --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration

You can get the following output:

Quantized accuracy before brecq: 0.13599999248981476
Weight quantization accuracy: 66.32799530029297
Full quantization (W2A4) accuracy: 65.21199798583984

Comments

how to reproduce zero data result?

as title.

there is a bug: https://github.com/yhhhli/BRECQ/blob/da93abc4f7e3ef437b356a2df8a5ecd8c326556e/main_imagenet.py#L173

args.batchsize should be args.workers

opened by yyfcc17 6
why not quantize the activation of the last conv layer in a block
Hi, Thanks for the release of your code. But I have one problem regarding the detail of the implementation. In quant_block.py, take the following code of ResNet-18 and ResNet-34 for example. The disable_act_quant is set True for conv2, which disables the quantization of the output of conv2.

class QuantBasicBlock(BaseQuantBlock): """ Implementation of Quantized BasicBlock used in ResNet-18 and ResNet-34. """ def __init__(self, basic_block: BasicBlock, weight_quant_params: dict = {}, act_quant_params: dict = {}): super().__init__(act_quant_params) self.conv1 = QuantModule(basic_block.conv1, weight_quant_params, act_quant_params) self.conv1.activation_function = basic_block.relu1 self.conv2 = QuantModule(basic_block.conv2, weight_quant_params, act_quant_params, disable_act_quant=True) # modify the activation function to ReLU self.activation_function = basic_block.relu2 if basic_block.downsample is None: self.downsample = None else: self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True) # copying all attributes in original block self.stride = basic_block.stride

It will cause a boost in accuracy, the following is the result I get use the your code and the same ImageNet dataset you used in the paper. [1] and [2] denotes the modification I did to the original code.

[1]: quant_block.py→QuantBasicBlock→__init__→self.conv2=QuantModule(... , disable_act_quant=True) self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True). Change from True to False; [2]: quant_block.py→QuantInvertedResidual→__init__→self.conv=nn.Sequential(..., QuantModule(... , disable_act_quant=True), change from True to False

But I do not think it is applicable for most of NPUs, which do quantization of every output of conv layer. So why not quantize the activation of the last conv layer in a block? Is there any particular reason for this? Also, for the methods you compared with in your paper, have you checked whether they do the same thing as you do or not?
opened by frankgt 3
disable act quantization is designed for convolution

Hi, Very impressive coding.

There is a question about the quantization of activation values.

In the code:

disable act quantization is designed for convolution before elemental-wise operation,

in that case, we apply activation function and quantization after ele-wise op.

Why can it be replaced like this？

Thanks

opened by xiayizhan2017 2
How to deal with data parallel and distributed data parallel?

On my eyes, your code is just running with single gpu while I need to test this code with multi-gpu for other implementations. I just want to check that you have ran your code using data parallel and distributed data parallel.

opened by jang0977 2
What is the purpose for setting retain_graph=True?

https://github.com/yhhhli/BRECQ/blob/2888b29de0a88ece561ae2443defc758444e41c1/quant/block_recon.py#L91

What is the purpose for setting retain_graph=True?

opened by un-knight 2
Cannot reproduce the accuracy

Greetings,

Really appreciate your open source contribution.

However, it seems the accuracy mentioned in the paper cannot be reproduced applying the standard Imagenet. For instance, with the full precision model, I have tested Resnet 18 (70.186%), MobileNetv2(71.618%), which is slightly lower than the results from your paper (71.08, 72.49 respectively).

Have you utilized any preprocessing techniques other than imagenet.build_imagenet_data?

Thanks

opened by mike-zyz 2

suggest replacing .view with .reshape in accuracy() function

Got an error:

Traceback (most recent call last):
  File "main_imagenet.py", line 198, in <module>
    print('Quantized accuracy before brecq: {}'.format(validate_model(test_loader, qnn)))
  File "/home/xxxx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "main_imagenet.py", line 108, in validate_model
    acc1, acc5 = accuracy(output, target, topk=(1, 5))
  File "main_imagenet.py", line 77, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

So suggest replacing .view with .reshape in accuracy() function.

opened by un-knight 1

channel_wise quantization

Hi, nice idea for quantizaton But it seems that the paper(not include the appendix) did not point that it is channel-wise quantization. however, the code showed it is. As we know, it is of course that channel-wise quntization would outperform layer-wise quantization. So, maybe it's hard to say that the performance of your method is close to QAT

opened by shiyuetianqiang 1
Some questions about implementation details
Hello, thank you for an interesting paper and nice code.

I have two questions concerning implementation details.

Does the "one-by-one" block reconstruction mentioned in the paper mean that input to each block comes from already quantized preceding blocks, i.e. each block may correct quantization errors coming from previous blocks? Or maybe input to each block is collected from the full-precision model?

Am I correct in my understanding that in block-wise reconstruction objective you use gradients for each object in calibration sample independently (i.e. no gradient averaging or smth, like in Adam mentioned on the paper)? Besides, what is happening here in data_utils.py, why do you add 1.0 to the gradients?

cached_grads = cached_grads.abs() + 1.0 # scaling to make sure its mean is 1 # cached_grads = cached_grads * torch.sqrt(cached_grads.numel() / cached_grads.pow(2).sum())

Thank you for your time and consideration!
opened by AndreevP 0
Quantization doesn't work?
Hi,

So I tried running your code on CIFAR-10 with a pre-trained ResNet50 model. I've attached the code below. My accuracy however does not come nearly as close to the float model which is around 93% but after quanitzation: I get:

Accuracy of the network on the 10000 test images: 10.0 % top5: 52.28 %

Please help me with this. The code is inside the zip file.

main_cifar.zip s
opened by praneet195 0
在使用论文中提出的Fisher-diag方式进行Hessian估计时会提示Trying to backward through the graph a second time

如文中所提出的Fisher-diag方式来估计Hessian矩阵，需要计算每一层pre-activation的梯度。但在实际代码运行时，save_grad_data中的cur_grad = get_grad(cali_data[i * batch_size:(i + 1) * batch_size])在执行到第二个batch的时候会报错Trying to backward through the graph a second time，第一个batch的数据并不会报错。不知道作者是否遇到过类似的情况？

opened by ariescts 2
Cuda Error when launching example

[email protected]:/path_to/BRECQ# python main_imagenet.py --data_path /path_to/IMAGENET_2012/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration You are using fake SyncBatchNorm2d who is actually the official BatchNorm2d ==> Using Pytorch Dataset Downloading: "https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet18_imagenet.pth.tar" to /root/.cache/torch/hub/checkpoints/resnet18_imagenet.pth.tar 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.6M/44.6M [00:27<00:00, 1.70MB/s] Traceback (most recent call last): File "main_imagenet.py", line 178, in cnn.cuda() File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in cuda return self._apply(lambda t: t.cuda(device)) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 570, in _apply module._apply(fn) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 593, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

opened by L-ED 1

Releases(v1.0)

v1.0(Feb 10, 2021)

Pretrained models on ImageNet
Source code(tar.gz)
Source code(zip)
mnasnet.pth.tar(47.90 MB)
mobilenetv2.pth.tar(26.95 MB)
regnet_3200m.pth.tar(58.66 MB)
regnet_600m.pth.tar(23.81 MB)
resnet18_imagenet.pth.tar(44.64 MB)
resnet50_imagenet.pth.tar(97.75 MB)

Owner

Yuhang Li

Research Intern at @SenseTime Group Limited

GitHub Repository

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

NeurIPS 2021 Title: Distilling Robust and Non-Robust Features in Adversarial Exa

35 Dec 26, 2022

Hardware accelerated, batchable and differentiable optimizers in JAX.

JAXopt Installation | Examples | References Hardware accelerated (GPU/TPU), batchable and differentiable optimizers in JAX. Installation JAXopt can be

621 Jan 08, 2023

🔥3D-RecGAN in Tensorflow (ICCV Workshops 2017)

3D Object Reconstruction from a Single Depth View with Adversarial Learning Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, Niki Trigoni

125 Nov 26, 2022

Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

Greedy Gradient Ensemble for De-biased VQA Code release for "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). GGE can

21 Jun 29, 2022

OpenLT: An open-source project for long-tail classification

OpenLT: An open-source project for long-tail classification Supported Methods for Long-tailed Recognition: Cross-Entropy Loss Focal Loss (ICCV'17) Cla

37 Sep 15, 2022

This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

GBW This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices. Ap

0 Oct 22, 2021

Nb workflows - A workflow platform which allows you to run parameterized notebooks programmatically

NB Workflows Description If SQL is a lingua franca for querying data, Jupyter sh

6 Aug 18, 2022

Active Offline Policy Selection With Python

Active Offline Policy Selection This is supporting example code for NeurIPS 2021 paper Active Offline Policy Selection by Ksenia Konyushkova*, Yutian

27 Oct 15, 2022

Posterior temperature optimized Bayesian models for inverse problems in medical imaging

Posterior temperature optimized Bayesian models for inverse problems in medical imaging Max-Heinrich Laves*, Malte Tölle*, Alexander Schlaefer, Sandy

6 Sep 19, 2022

Image segmentation with private İstanbul Dataset

Image Segmentation This repo was created for academic research and test result. Repo will update after academic article online. This repo contains wei

9 Dec 11, 2022

A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

718 Jan 01, 2023

Multi-Glimpse Network With Python

Multi-Glimpse Network Our code requires Python ≥ 3.8 Installation For example, venv + pip: $ python3 -m venv env $ source env/bin/activate (env) $ pyt

9 May 10, 2022

A deep learning model for style-specific music generation.

DeepJ: A model for style-specific music generation https://arxiv.org/abs/1801.00887 Abstract Recent advances in deep neural networks have enabled algo

704 Nov 23, 2022

A Marvelous ChatBot implement using PyTorch.

PyTorch Marvelous ChatBot [Update] it's 2019 now, previously model can not catch up state-of-art now. So we just move towards the future a transformer

223 Oct 18, 2022

learned_optimization: Training and evaluating learned optimizers in JAX

learned_optimization: Training and evaluating learned optimizers in JAX learned_optimization is a research codebase for training learned optimizers. I

533 Dec 30, 2022

Proto-RL: Reinforcement Learning with Prototypical Representations

Proto-RL: Reinforcement Learning with Prototypical Representations This is a PyTorch implementation of Proto-RL from Reinforcement Learning with Proto

74 Dec 06, 2022

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

PyDEns PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks. With PyDEns one can solve PD

220 Dec 26, 2022

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

10 Oct 10, 2022

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Federated Learning with Non-IID Data This is an implementation of the following paper: Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vik

48 Dec 29, 2022

disentanglement_lib is an open-source library for research on learning disentangled representations.

disentanglement_lib disentanglement_lib is an open-source library for research on learning disentangled representation. It supports a variety of diffe

1.3k Dec 28, 2022

Pytorch implementation of BRECQ, ICLR 2021

Related tags

Overview

BRECQ

Pretrained models

Usage

Comments

disable act quantization is designed for convolution before elemental-wise operation,

in that case, we apply activation function and quantization after ele-wise op.

Releases(v1.0)

v1.0(Feb 10, 2021)

Owner

Yuhang Li

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Hardware accelerated, batchable and differentiable optimizers in JAX.

🔥3D-RecGAN in Tensorflow (ICCV Workshops 2017)

Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

OpenLT: An open-source project for long-tail classification

This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

Nb workflows - A workflow platform which allows you to run parameterized notebooks programmatically

Active Offline Policy Selection With Python

Posterior temperature optimized Bayesian models for inverse problems in medical imaging

Image segmentation with private İstanbul Dataset

A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Multi-Glimpse Network With Python

A deep learning model for style-specific music generation.

A Marvelous ChatBot implement using PyTorch.

learned_optimization: Training and evaluating learned optimizers in JAX

Proto-RL: Reinforcement Learning with Prototypical Representations

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

disentanglement_lib is an open-source library for research on learning disentangled representations.