A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Last update: Oct 28, 2022

Related tags

Deep Learning GFNet-Pytorch

Overview

GFNet-Pytorch (NeurIPS 2020)

This repo contains the official code and pre-trained models for the glance and focus network (GFNet).

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classiﬁcation

Citation

@inproceedings{NeurIPS2020_7866,
        title = {Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification},
       author = {Wang, Yulin and Lv, Kangchen and Huang, Rui and Song, Shiji and Yang, Le and Huang, Gao},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
         year = {2020},
}

Update on 2020/10/08: Release Pre-trained Models and the Inference Code on ImageNet.

Update on 2020/12/28: Release Training Code.

Introduction

Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efﬁcient image classiﬁcation by processing a sequence of relatively small inputs, which are strategically cropped from the original image. Experiments on ImageNet show that our method consistently improves the computational efﬁciency of a wide variety of deep models. For example, it further reduces the average latency of the highly efﬁcient MobileNet-V3 on an iPhone XS Max by 20% without sacriﬁcing accuracy.

Results

Top-1 accuracy on ImageNet v.s. Multiply-Adds

Top-1 accuracy on ImageNet v.s. Inference Latency (ms) on an iPhone XS Max

Visualization

Pre-trained Models

Backbone CNNs	Patch Size	T	Links
ResNet-50	96x96	5	Tsinghua Cloud / Google Drive
ResNet-50	128x128	5	Tsinghua Cloud / Google Drive
DenseNet-121	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-169	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-201	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-600MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-800MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-1.6GF	96x96	5	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	96x96	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	128x128	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.25)	128x128	3	Tsinghua Cloud / Google Drive
EfﬁcientNet-B2	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	144x144	4	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_name: name of the backbone CNNs (e.g., resnet50, densenet121)
├── patch_size: size of image patches (i.e., H' or W' in the paper)
├── model_prime_state_dict, model_state_dict, fc, policy: state dictionaries of the four components of GFNets
├── model_flops, policy_flops, fc_flops: Multiply-Adds of inferring the encoder, patch proposal network and classifier for once
├── flops: a list containing the Multiply-Adds corresponding to each length of the input sequence during inference
├── anytime_classification: results of anytime prediction (in Top-1 accuracy)
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2
pyyaml 5.3.1 (for RegNets)

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Training

Here we take training ResNet-50 (96x96, T=5) for example. All the used initialization models and stage-1/2 checkpoints can be found in Tsinghua Cloud / Google Drive. Currently, this link includes ResNet and MobileNet-V3. We will update it as soon as possible. If you need other helps, feel free to contact us.
The Results in the paper is based on 2 Tesla V100 GPUs. For most of experiments, up to 4 Titan Xp GPUs may be enough.

Training stage 1, the initializations of global encoder (model_prime) and local encoder (model) are required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 1 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --model_prime_path PATH_TO_CHECKPOINTS  --model_path PATH_TO_CHECKPOINTS

Training stage 2, a stage-1 checkpoint is required:

CUDA_VISIBLE_DEVICES=0 python train.py --data_url PATH_TO_DATASET --train_stage 2 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Training stage 3, a stage-2 checkpoint is required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 3 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of MobileNet-V3 and EfficientNet is from here. Our code of RegNet is from here.

To Do

Update the code for visualizing.
Update the code for MIXED PRECISION TRAINING。

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Related tags

Overview

GFNet-Pytorch (NeurIPS 2020)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Training

Contact

Acknowledgment

To Do

Owner

Rainforest Wang

Implementation of Shape and Electrostatic similarity metric in deepFMPO.

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Official repository for "Intriguing Properties of Vision Transformers" (2021)

My implementation of transformers related papers for computer vision in pytorch

Rank1 Conversation Emotion Detection Task

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Cortex-compatible model server for Python and TensorFlow

Constrained Language Models Yield Few-Shot Semantic Parsers

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Yet another video caption

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

Determined: Deep Learning Training Platform

MobileNetV1-V2，MobileNeXt，GhostNet，AdderNet，ShuffleNetV1-V2，Mobile+ViT etc.