Glance-and-Focus Networks (PyTorch)

This repo contains the official code and pre-trained models for the glance and focus networks (GFNet).

(NeurIPS 2020) Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classiﬁcation
(T-PAMI) Glance and Focus Networks for Dynamic Visual Recognition

Update on 2020/12/28: Release Training Code.

Update on 2020/10/08: Release Pre-trained Models and the Inference Code on ImageNet.

Introduction

Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efﬁcient image classiﬁcation by processing a sequence of relatively small inputs, which are strategically cropped from the original image. Experiments on ImageNet show that our method consistently improves the computational efﬁciency of a wide variety of deep models. For example, it further reduces the average latency of the highly efﬁcient MobileNet-V3 on an iPhone XS Max by 20% without sacriﬁcing accuracy.

Citation

@inproceedings{NeurIPS2020_7866,
        title={Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification},
        author={Wang, Yulin and Lv, Kangchen and Huang, Rui and Song, Shiji and Yang, Le and Huang, Gao},
        booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
        year={2020},
}

@article{huang2023glance,
        title={Glance and Focus Networks for Dynamic Visual Recognition}, 
        author={Huang, Gao and Wang, Yulin and Lv, Kangchen and Jiang, Haojun and Huang, Wenhui and Qi, Pengfei and Song, Shiji},
        journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
        year={2023},
        volume={45},
        number={4},
        pages={4605-4621},
        doi={10.1109/TPAMI.2022.3196959}
}

Results

Top-1 accuracy on ImageNet v.s. Multiply-Adds

Top-1 accuracy on ImageNet v.s. Inference Latency (ms) on an iPhone XS Max

Visualization

Pre-trained Models

Backbone CNNs	Patch Size	T	Links
ResNet-50	96x96	5	Tsinghua Cloud / Google Drive
ResNet-50	128x128	5	Tsinghua Cloud / Google Drive
DenseNet-121	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-169	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-201	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-600MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-800MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-1.6GF	96x96	5	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	96x96	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	128x128	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.25)	128x128	3	Tsinghua Cloud / Google Drive
EfﬁcientNet-B2	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	144x144	4	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_name: name of the backbone CNNs (e.g., resnet50, densenet121)
├── patch_size: size of image patches (i.e., H' or W' in the paper)
├── model_prime_state_dict, model_state_dict, fc, policy: state dictionaries of the four components of GFNets
├── model_flops, policy_flops, fc_flops: Multiply-Adds of inferring the encoder, patch proposal network and classifier for once
├── flops: a list containing the Multiply-Adds corresponding to each length of the input sequence during inference
├── anytime_classification: results of anytime prediction (in Top-1 accuracy)
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2
pyyaml 5.3.1 (for RegNets)

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Training

Here we take training ResNet-50 (96x96, T=5) for example. All the used initialization models and stage-1/2 checkpoints can be found in Tsinghua Cloud / Google Drive. Currently, this link includes ResNet and MobileNet-V3. We will update it as soon as possible. If you need other helps, feel free to contact us.
The Results in the paper is based on 2 Tesla V100 GPUs. For most of experiments, up to 4 Titan Xp GPUs may be enough.

Training stage 1, the initializations of global encoder (model_prime) and local encoder (model) are required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 1 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --model_prime_path PATH_TO_CHECKPOINTS  --model_path PATH_TO_CHECKPOINTS

Training stage 2, a stage-1 checkpoint is required:

CUDA_VISIBLE_DEVICES=0 python train.py --data_url PATH_TO_DATASET --train_stage 2 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Training stage 3, a stage-2 checkpoint is required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 3 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: wang-yl19@mails.tsinghua.edu.cn.

Acknowledgment

Our code of MobileNet-V3 and EfficientNet is from here. Our code of RegNet is from here.

To Do

Update the code for visualizing.
Update the code for MIXED PRECISION TRAINING。

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
figures		figures
models		models
pycls		pycls
simplejson		simplejson
yacs		yacs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configs.py		configs.py
inference.py		inference.py
network.py		network.py
train.py		train.py
utils.py		utils.py

License

blackfeather-wang/GFNet-Pytorch

Folders and files

Latest commit

History

Repository files navigation

Glance-and-Focus Networks (PyTorch)

Introduction

Citation

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Training

Contact

Acknowledgment

To Do

About

Resources

License

Stars

Watchers

Forks

Languages