[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

Overview

CPT: Efficient Deep Neural Network Training via Cyclic Precision

Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

Accepted at ICLR 2021 (Spotlight) [Paper Link].

Overview

Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs’ training time/energy efficiency. In this paper, we attempt to explore low-precision training from a new perspective as inspired by recent findings in understanding DNN training: we conjecture that DNNs’ precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training. Specifically, we propose Cyclic Precision Training (CPT) to cyclically vary the precision between two boundary values to balance the coarse-grained exploration of low precision and fine-grained optimization of high precision. Through experiments and visualization we show that CPT helps to (1) converge to a wider minima with a lower generalization error and (2) reduce training variance, which opens up a new design knob for simultaneously improving the optimization and efficiency of DNN training.

Experimental Results

We evaluate CPT on eleven models & five datasets (i.e., ResNet-38/74/110/152/164/MobileNetV2 on CIFAR-10/100, ResNet-18/34/50 on ImageNet, Transformer on WikiText-103, LSTM on PTB). Please refer to our paper for more results.

Results on CIFAR-100

  • Test accuracy vs. training computational cost

  • Loss landscape visualization

Results on ImageNet

  • Accuracy - training efficiency trade-off

  • Boosting optimality

Results on WikiText-103 and PTB

Code Usage

cpt_cifar and cpt_imagenet are the codes customized for CIFAR-10/100 and ImageNet, respectively, with a similar code structure.

Prerequisites

See env.yml for the complete conda environment. Create a new conda environment:

conda env create -f env.yml
conda activate pytorch

Training on CIFAR-10/100 with CPT

In addition to the commonly considered args, e.g., the target network, dataset, and data path via --arch, --dataset, and --datadir, respectively, you also need to: (1) enable cyclic precision training via --is_cyclic_precision; (2) specify the precision bounds for both forward (weights and activations) and backward (gradients and errors) with --cyclic_num_bits_schedule and --cyclic_num_grad_bits_schedule, respectively (note that in CPT, we adopt a constant precision during backward for more stable training process as analyzed in our appendix); (3) specify the number of cyclic periods via --num_cyclic_period which can be set as 32 in all experiments and more ablation studies can be found in Sec. 4.3 of our paper.

  • Example: Training ResNet-74 on CIFAR-100 with CPT (3~8-bit forward, 8-bit backward, and a cyclic periods of 32).
cd cpt_cifar
python train.py --save_folder ./logs --arch cifar100_resnet_74 --workers 4 --dataset cifar100 --datadir path-to-cifar100 --is_cyclic_precision --cyclic_num_bits_schedule 3 8 --cyclic_num_grad_bits_schedule 8 8 --num_cyclic_period 32

We also integrate SWA in our code although it is not used in the reported results of our paper.

Training on ImageNet with CPT

The args for ImageNet experiments are similar with the ones on CIFAR-10/100.

  • Example: Training ResNet-34 on ImageNet with CPT (3~8-bit forward, 8-bit backward, and a cyclic periods of 32).
cd cpt_imagenet
python train.py --save_folder ./logs --arch resnet34 --warm_up --datadir PATH_TO_IMAGENET --is_cyclic_precision --cyclic_num_bits_schedule 3 8 --cyclic_num_grad_bits_schedule 8 8 --num_cyclic_period 32 --automatic_resume

Citation

@article{fu2021cpt,
  title={CPT: Efficient Deep Neural Network Training via Cyclic Precision},
  author={Fu, Yonggan and Guo, Han and Li, Meng and Yang, Xin and Ding, Yining and Chandra, Vikas and Lin, Yingyan},
  journal={arXiv preprint arXiv:2101.09868},
  year={2021}
}

Our Related Work

Please also check our work on how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input:

Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, Yingyan Lin. "FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training". NeurIPS, 2020. [Paper Link] [Code]

Owner
Efficient and Intelligent Computing Lab
RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining Our code is based on Learning Attention-based Embed

宋朝都 4 Aug 07, 2022
MaskTrackRCNN for video instance segmentation based on mmdetection

MaskTrackRCNN for video instance segmentation Introduction This repo serves as the official code release of the MaskTrackRCNN model for video instance

411 Jan 05, 2023
This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

18 Sep 02, 2022
Vision Transformer for 3D medical image registration (Pytorch).

ViT-V-Net: Vision Transformer for Volumetric Medical Image Registration keywords: vision transformer, convolutional neural networks, image registratio

Junyu Chen 192 Dec 20, 2022
Tensorflow2.0 🍎🍊 is delicious, just eat it! 😋😋

How to eat TensorFlow2 in 30 days ? 🔥 🔥 Click here for Chinese Version(中文版) 《10天吃掉那只pyspark》 🚀 github项目地址: https://github.com/lyhue1991/eat_pyspark

lyhue1991 9.7k Jan 01, 2023
🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (CVPR 2020) This is the official implementation of RandLA-Net (CVPR2020, Oral

Qingyong 1k Dec 30, 2022
Predict the latency time of the deep learning models

Deep Neural Network Prediction Step 1. Genernate random parameters and Run them sequentially : $ python3 collect_data.py -gp -ep -pp -pl pooling -num

QAQ 1 Nov 12, 2021
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

HamasKhan 3 Jul 08, 2022
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

DOC | Quick Start | 中文 Breaking News !! 🔥 🔥 🔥 OGB-LSC KDD CUP 2021 winners announced!! (2021.06.17) Super excited to announce our PGL team won TWO

1.5k Jan 06, 2023
AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Frank Liu 26 Oct 13, 2022
General neural ODE and DAE modules for power system dynamic modeling.

Py_PSNODE General neural ODE and DAE modules for power system dynamic modeling. The PyTorch-based ODE solver is developed based on torchdiffeq. Sample

14 Dec 31, 2022
Model Agnostic Interpretability for Multiple Instance Learning

MIL Model Agnostic Interpretability This repo contains the code for "Model Agnostic Interpretability for Multiple Instance Learning". Overview Executa

Joe Early 10 Dec 17, 2022
This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.

yolov5_deepsort_tensorrt_cpp Introduction This repo is a C++ version of yolov5_deepsort_tensorrt. And packing all C++ programs into .so files, using P

41 Dec 27, 2022
Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

3D-GMPDCNN Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network PyTorch implementation of "Geological Modeling Usin

5 Nov 21, 2022
ULMFiT for Genomic Sequence Data

Genomic ULMFiT This is an implementation of ULMFiT for genomics classification using Pytorch and Fastai. The model architecture used is based on the A

Karl 276 Dec 12, 2022
Generating Radiology Reports via Memory-driven Transformer

R2Gen This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020. Citations If you use or extend our work,

CUHK-SZ NLP Group 101 Dec 13, 2022
Implementation of Online Label Smoothing in PyTorch

Online Label Smoothing Pytorch implementation of Online Label Smoothing (OLS) presented in Delving Deep into Label Smoothing. Introduction As the abst

83 Dec 14, 2022
This is the official pytorch implementation of AutoDebias, an automatic debiasing method for recommendation.

AutoDebias This is the official pytorch implementation of AutoDebias, a debiasing method for recommendation system. AutoDebias is proposed in the pape

Dong Hande 77 Nov 25, 2022
A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Yutian Liu 2 Jan 29, 2022
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving This is the source code for our paper Frequency Domain Image Tran

Mu Cai 52 Dec 23, 2022