How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

Last update: Sep 20, 2022

Related tags

Overview

AdamBNN

This is the pytorch implementation of our paper "How Do Adam and Training Strategies Help BNNs Optimization?", published in ICML 2021.

In this work, we explore the intrisic reasons why Adam is superior to other optimizers like SGD for BNN optimization and provide analytical explanations that support specific training strategies. By visualizing the optimization trajectory, we show that the optimization lies in extremely rugged loss landscape and the second-order momentum in Adam is crucial to revitalize the weights that are dead due to the activation saturation in BNNs. Based on analysis, we derive a specific training scheme and achieve 70.5% top-1 accuracy on the ImageNet dataset using the same achitecture as ReActNet while achieving 1.1% higher accuracy.

Citation

If you find our code useful for your research, please consider citing:

@conference{liu2021how,
title = {How do adam and training strategies help bnns optimization?},
author = {Liu, Zechun and Shen, Zhiqiang and Li, Shichao and Helwegen, Koen and Huang, Dong and Cheng, Kwang-Ting},
booktitle = {International Conference on Machine Learning},
year = {2021},
organization={PMLR}
}

Run

1. Requirements:

python3, pytorch 1.7.1, torchvision 0.8.2

2. Data:

Download ImageNet dataset

3. Steps to run:

(1) Step1: binarizing activations

Change directory to ./step1/
run bash run.sh

(2) Step2: binarizing weights + activations

Change directory to ./step2/
run bash run.sh

Models

Methods	Backbone	Top1-Acc	FLOPs	Trained Model
ReActNet	ReActNet-A	69.4%	0.87 x 10^8	Model-ReAct
AdamBNN	ReActNet-A	70.5%	0.87 x 10^8	Model-ReAct-AdamBNN-Training

Contact

Zechun Liu, HKUST and CMU (zliubq at connect.ust.hk / zechunl at andrew.cmu.edu)

Zhiqiang Shen, CMU (zhiqians at andrew.cmu.edu)

How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

Related tags

Overview

AdamBNN

Citation

Run

1. Requirements:

2. Data:

3. Steps to run:

Models

Contact

Owner

Zechun Liu

CDGAN: Cyclic Discriminative Generative Adversarial Networks for Image-to-Image Transformation

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

A curated list of references for MLOps

Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

Implementation of the Chamfer Distance as a module for pyTorch

DeepFaceLive - Live Deep Fake in python, Real-time face swap for PC streaming or video calls

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

Discovering and Achieving Goals via World Models

Lipschitz-constrained Unsupervised Skill Discovery

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

Reporting and Visualization for Hazardous Events

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)"

kullanışlı ve işinizi kolaylaştıracak bir araç

Memory-Augmented Model Predictive Control

Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

Pytorch implemenation of Stochastic Multi-Label Image-to-image Translation (SMIT)