Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Last update: Dec 23, 2022

Overview

Attention Transfer

PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer" https://arxiv.org/abs/1612.03928
Conference paper at ICLR2017: https://openreview.net/forum?id=Sks9_ajex

What's in this repo so far:

Activation-based AT code for CIFAR-10 experiments
Code for ImageNet experiments (ResNet-18-ResNet-34 student-teacher)
Jupyter notebook to visualize attention maps of ResNet-34 visualize-attention.ipynb

Coming:

grad-based AT
Scenes and CUB activation-based AT code

The code uses PyTorch https://pytorch.org. Note that the original experiments were done using torch-autograd, we have so far validated that CIFAR-10 experiments are exactly reproducible in PyTorch, and are in process of doing so for ImageNet (results are very slightly worse in PyTorch, due to hyperparameters).

bibtex:

@inproceedings{Zagoruyko2017AT,
    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {Paying More Attention to Attention: Improving the Performance of
             Convolutional Neural Networks via Attention Transfer},
    booktitle = {ICLR},
    url = {https://arxiv.org/abs/1612.03928},
    year = {2017}}

Requirements

First install PyTorch, then install torchnet:

pip install git+https://github.com/pytorch/[email protected]

then install other Python packages:

pip install -r requirements.txt

Experiments

CIFAR-10

This section describes how to get the results in the table 1 of the paper.

First, train teachers:

python cifar.py --save logs/resnet_40_1_teacher --depth 40 --width 1
python cifar.py --save logs/resnet_16_2_teacher --depth 16 --width 2
python cifar.py --save logs/resnet_40_2_teacher --depth 40 --width 2

To train with activation-based AT do:

python cifar.py --save logs/at_16_1_16_2 --teacher_id resnet_16_2_teacher --beta 1e+3

To train with KD:

python cifar.py --save logs/kd_16_1_16_2 --teacher_id resnet_16_2_teacher --alpha 0.9

We plan to add AT+KD with decaying beta to get the best knowledge transfer results soon.

ImageNet

Pretrained model

We provide ResNet-18 pretrained model with activation based AT:

Model	val error
ResNet-18	30.4, 10.8
ResNet-18-ResNet-34-AT	29.3, 10.0

Download link: https://s3.amazonaws.com/modelzoo-networks/resnet-18-at-export.pth

Model definition: https://github.com/szagoruyko/functional-zoo/blob/master/resnet-18-at-export.ipynb

Convergence plot:

Train from scratch

Download pretrained weights for ResNet-34 (see also functional-zoo for more information):

wget https://s3.amazonaws.com/modelzoo-networks/resnet-34-export.pth

Prepare the data following fb.resnet.torch and run training (e.g. using 2 GPUs):

python imagenet.py --imagenetpath ~/ILSVRC2012 --depth 18 --width 1 \
                   --teacher_params resnet-34-export.hkl --gpu_id 0,1 --ngpu 2 \
                   --beta 1e+3

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Related tags

Overview

Attention Transfer

Requirements

Experiments

CIFAR-10

ImageNet

Pretrained model

Train from scratch

Owner

Sergey Zagoruyko

PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

Implementation of Pooling by Sliced-Wasserstein Embedding (NeurIPS 2021)

Image marine sea litter prediction Shiny

Conditional Gradients For The Approximately Vanishing Ideal

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Randstad Artificial Intelligence Challenge (powered by VGEN). Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato

Keras code and weights files for popular deep learning models.

EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures

PyTorch implementation of "Continual Learning with Deep Generative Replay", NIPS 2017

Implementation of our NeurIPS 2021 paper "A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs".

A strongly-typed genetic programming framework for Python

The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal Transport Maps, ICLR 2022.

As a part of the HAKE project, includes the reproduced SOTA models and the corresponding HAKE-enhanced versions (CVPR2020).

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Materials for upcoming beginner-friendly PyTorch course (work in progress).

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Direct design of biquad filter cascades with deep learning by sampling random polynomials.

Meli Data Challenge 2021 - First Place Solution