Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

Overview

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

This repository contains the source code for the paper (link will be posted).

Requirements

  • GPU
  • Python 3
  • PyTorch 1.9
    • Earlier version may work, but untested.
  • pip install -r requirements.txt
  • If running ResNet-21 or ImageNet experiments, first download and prepare the ImageNet 2012 dataset with bin/imagenet_prep.sh script.

Running

For non-ImageNet experiments, the main python file is main.py. To see its arguments:

python main.py --help

Running for the first time can take a little longer due to automatic downloading of the MNIST and Cifar-10 dataset from the Internet.

For ImageNet experiments, the main python files are main_imagenet_float.py and main_imagenet_binary.py. Too see their arguments:

python main_imagenet_float.py --help

and

python main_imagenet_binary.py --help

The ImageNet dataset must be already downloaded and prepared. Please see the requirements section for details.

Scripts

The main python file has many options. The following scripts runs training with hyper-parameters given in the paper. Output includes a run-log text file and tensorboard files. These files are saved to ./logs and reused for subsequent runs.

300-100-10

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity forward 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity 231 0.1 0
# Layer 3. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity reverse 0.1 0

Output files and run-log are written to ./logs/mnist/val/sensitivity/.

Hyperparam search

For floating-point training:

# Learning rate 0.1.
./scripts/mnist/300/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/mnist/300/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam forward 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam reverse 0.1 0
# 1, 3, 2 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 132 0.1 0
# 2, 1, 3 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 213 0.1 0
# 2, 3, 1 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 231 0.1 0
# 3, 1, 2 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 312 0.1 0

Output files and run-log are written to ./logs/mnist/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full forward 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full reverse 0.1 316 0
# 1, 3, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 132 0.1 316 0
# 2, 1, 3 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 213 0.1 316 0
# 2, 3, 1 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 231 0.1 316 0
# 3, 1, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 312 0.1 316 0

Output files and run-log are written to ./logs/mnist/run/full/.

784-100-10

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity forward 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity 231 0.1 0
# Layer 3. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity reverse 0.1 0

Output files and run-log are written to ./logs/mnist/val/sensitivity/.

Hyperparam search

For floating-point training:

# Learning rate 0.1.
./scripts/mnist/784/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/mnist/784/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam forward 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam reverse 0.1 0
# 1, 3, 2 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 132 0.1 0
# 2, 1, 3 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 213 0.1 0
# 2, 3, 1 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 231 0.1 0
# 3, 1, 2 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 312 0.1 0

Output files and run-log are written to ./logs/mnist/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full forward 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full reverse 0.1 316 0
# 1, 3, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 132 0.1 316 0
# 2, 1, 3 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 213 0.1 316 0
# 2, 3, 1 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 231 0.1 316 0
# 3, 1, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 312 0.1 316 0

Output files and run-log are written to ./logs/mnist/run/full/.

Vgg-5

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 2 0.1 0
# Layer 5. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 5 0.1 0

Output files and run-log are written to ./logs/cifar10/val/sensitivity/.

Hyperparam Search

For floating-point training:

# Learning rate 0.1.
./scripts/cifar10/vgg5/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/cifar10/vgg5/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam random 0.1 0

Output files and run-log are written to ./logs/cifar10/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full random 0.1 316 0

Output files and run-log are written to ./logs/cifar10/run/full/.

Vgg-9

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 2 0.1 0
# Layer 5. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 5 0.1 0

Output files and run-log are written to ./logs/cifar10/val/sensitivity/.

Hyperparam Search

For floating-point training:

# Learning rate 0.1.
./scripts/cifar10/vgg9/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/cifar10/vgg9/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam random 0.1 0

Output files and run-log are written to ./logs/cifar10/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full random 0.1 316 0

Output files and run-log are written to ./logs/cifar10/run/full/.

ResNet-20

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 2 0.1 0
# ...
# Layer 20. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 20 0.1 0

Output files and run-log are written to ./logs/cifar10/val/sensitivity/.

Hyperparam Search

For floating-point training:

# Learning rate 0.1
./scripts/cifar10/resnet20/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1
./scripts/cifar10/resnet20/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam random 0.1 0

Output files and run-log are written to ./logs/cifar10/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full random 0.1 316 0

Output files and run-log are written to ./logs/cifar10/run/full/.

ResNet-21

To run experiments for ResNet-21, first download and prepare the ImageNet dataset. See the requirements section at the beginning of this readme. We assume the dataset is prepared and is at ./imagenet.

Sensitivity Pre-training

# Layer 1. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 1 0.01
# Layer 2. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 2 0.01
# Layer 21. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 21 0.01

Output files and run-log are written to ./logs/imagenet/sensitivity/.

Full Training

For floating-point training:

# Learning rate 0.01.
./scripts/imagenet/float.sh full ./imagenet 67 "[42,57]" 0.01

For full binary training:

# Learning rate 0.01.
./scripts/imagenet/binary.sh full ./imagenet 67 "[42,57]" 0.01

For layer-by-layer training:

# Forward order
./scripts/imagenet/layer.sh full ./imagenet 67 "[42,57]" 2 forward 0.01
# Ascending order
./scripts/imagenet/layer.sh full ./imagenet 67 "[42,57]" 2 ascend 0.01

For all scripts, output files and run-log are written to ./logs/imagenet/full/.

License

See LICENSE

Contributing

See the contributing guide for details of how to participate in development of the module.

Owner
Rakuten Group, Inc.
Rakuten Group, Inc.
Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Population-Based Bandits (PB2) Code for the Population-Based Bandits (PB2) Algorithm, from the paper Provably Efficient Online Hyperparameter Optimiza

Jack Parker-Holder 22 Nov 16, 2022
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

EleutherAI 96 Dec 21, 2022
[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".

SMYRF: Efficient attention using asymmetric clustering Get started: Abstract We propose a novel type of balanced clustering algorithm to approximate a

Giannis Daras 46 Dec 22, 2022
Implementation of U-Net and SegNet for building segmentation

Specialized project Created by Katrine Nguyen and Martin Wangen-Eriksen as a part of our specialized project at Norwegian University of Science and Te

Martin.w-e 3 Dec 07, 2022
Protect against subdomain takeover

domain-protect scans Amazon Route53 across an AWS Organization for domain records vulnerable to takeover deploy to security audit account scan your en

OVO Technology 0 Nov 17, 2022
As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

VITA 68 Sep 05, 2022
Exemplo de implementação do padrão circuit breaker em python

fast-circuit-breaker Circuit breakers existem para permitir que uma parte do seu sistema falhe sem destruir todo seu ecossistema de serviços. Michael

James G Silva 17 Nov 10, 2022
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
Image processing in Python

scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l

Image Processing Toolbox for SciPy 5.2k Dec 31, 2022
YOLOX + ROS(1, 2) object detection package

YOLOX + ROS(1, 2) object detection package

Ar-Ray 158 Dec 21, 2022
This repository contains the map content ontology used in narrative cartography

Narrative-cartography-ontology This repository contains the map content ontology used in narrative cartography, which is associated with a submission

Weiming Huang 0 Oct 31, 2021
The first public PyTorch implementation of Attentive Recurrent Comparators

arc-pytorch PyTorch implementation of Attentive Recurrent Comparators by Shyam et al. A blog explaining Attentive Recurrent Comparators Visualizing At

Sanyam Agarwal 150 Oct 14, 2022
Official git for "CTAB-GAN: Effective Table Data Synthesizing"

CTAB-GAN This is the official git paper CTAB-GAN: Effective Table Data Synthesizing. The paper is published on Asian Conference on Machine Learning (A

30 Dec 26, 2022
Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Mining the Social Web, 3rd Edition The official code repository for Mining the Social Web, 3rd Edition (O'Reilly, 2019). The book is available from Am

Mikhail Klassen 838 Jan 01, 2023
Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Gradient Cache Gradient Cache is a simple technique for unlimitedly scaling contrastive learning batch far beyond GPU memory constraint. This means tr

Luyu Gao 198 Dec 29, 2022
Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

THESIS_CAIRONE_FIORENTINO Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques" GENERATE TOKE

cairone_fiorentino97 1 Dec 10, 2021
Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Şebnem 6 Jan 18, 2022
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Realtime Multi-Person Pose Estimation By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Introduction Code repo for winning 2016 MSCOCO Keypoints Cha

Zhe Cao 4.9k Dec 31, 2022
Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

CorrelAid Machine Learning Winter School Welcome to the CorrelAid ML Winter School! Task The problem we want to solve is to classify trees in Roosevel

CorrelAid 12 Nov 23, 2022