Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Overview

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

This repository contains the PyTorch code for Evo-ViT.

This work proposes a slow-fast token evolution approach to accelerate vanilla vision transformers of both flat and deep-narrow structures without additional pre-training and fine-tuning procedures. For details please see Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer by Yifan Xu*, Zhijie Zhang*, Mengdan Zhang, Kekai Sheng, Ke Li, Weiming Dong, Liqing Zhang, Changsheng Xu, and Xing Sun. intro

Our code is based on pytorch-image-models, DeiT, and LeViT.

Preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively.

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

All distillation settings are conducted with a teacher model RegNetY-160, which is available at teacher checkpoint.

Install the requirements by running:

pip3 install -r requirements.txt

NOTE that all experiments in the paper are conducted under cuda11.0. If necessary, please install the following packages under the environment with CUDA version 11.0: torch1.7.0-cu110, torchvision-0.8.1-cu110.

Model Zoo

We provide our Evo-ViT models pretrained on ImageNet:

Name Top-1 Acc (%) Throughput (img/s) Url
Evo-ViT-T 72.0 4027 Google Drive
Evo-ViT-S 79.4 1510 Google Drive
Evo-ViT-B 81.3 462 Google Drive
Evo-LeViT-128S 73.0 10135 Google Drive
Evo-LeViT-128 74.4 8323 Google Drive
Evo-LeViT-192 76.8 6148 Google Drive
Evo-LeViT-256 78.8 4277 Google Drive
Evo-LeViT-384 80.7 2412 Google Drive
Evo-ViT-B* 82.0 139 Google Drive
Evo-LeViT-256* 81.1 1285 Google Drive
Evo-LeViT-384* 82.2 712 Google Drive

The input image resolution is 224 × 224 unless specified. * denotes the input image resolution is 384 × 384.

Usage

Evaluation

To evaluate a pre-trained model, run:

python3 main_deit.py --model evo_deit_small_patch16_224 --eval --resume /path/to/checkpoint.pth --batch-size 256 --data-path /path/to/imagenet

Training with input resolution of 224

To train Evo-ViT on ImageNet on a single node with 8 gpus for 300 epochs, run:

Evo-ViT-T

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_deit.py --model evo_deit_tiny_patch16_224 --drop-path 0 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save

Evo-ViT-S

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_deit.py --model evo_deit_small_patch16_224 --batch-size 128 --data-path /path/to/imagenet --output_dir /path/to/save

Sometimes loss Nan happens in the early training epochs of DeiT-B, which is described in this issue. Our solution is to reduce the batch size to 128, load a warmup checkpoint trained for 9 epochs, and train Evo-ViT for the remaining 291 epochs. To train Evo-ViT-B on ImageNet on a single node with 8 gpus for 300 epochs, run:

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_deit.py --model evo_deit_base_patch16_224 --batch-size 128 --data-path /path/to/imagenet --output_dir /path/to/save --resume /path/to/warmup_checkpoint.pth

To train Evo-LeViT-128 on ImageNet on a single node with 8 gpus for 300 epochs, run:

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_levit.py --model EvoLeViT_128 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save

The other models of Evo-LeViT are trained with the same command as mentioned above.

Training with input resolution of 384

To train Evo-ViT-B* on ImageNet on 2 nodes with 8 gpus each for 300 epochs, run:

python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=$NODE_SIZE  --node_rank=$NODE_RANK --master_port=$MASTER_PORT --master_addr=$MASTER_ADDR main_deit.py --model evo_deit_base_patch16_384 --input-size 384 --batch-size 64 --data-path /path/to/imagenet --output_dir /path/to/save

To train Evo-ViT-S* on ImageNet on a single node with 8 gpus for 300 epochs, run:

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_deit.py --model evo_deit_small_patch16_384 --batch-size 128 --input-size 384 --data-path /path/to/imagenet --output_dir /path/to/save"

To train Evo-LeViT-384* on ImageNet on a single node with 8 gpus for 300 epochs, run:

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_levit.py --model EvoLeViT_384_384 --input-size 384 --batch-size 128 --data-path /path/to/imagenet --output_dir /path/to/save

The other models of Evo-LeViT* are trained with the same command of Evo-LeViT-384*.

Testing inference throughput

To test inference throughput, first modify the model name in line 153 of benchmark.py. Then, run:

python3 benchmark.py

The defauld input resolution is 224. To test inference throughput with input resolution of 384, please add the parameter "--img_size 384"

Visualization of token selection

The visualization code is modified from DynamicViT.

To visualize a batch of ImageNet val images, run:

python3 visualize.py --model evo_deit_small_vis_patch16_224 --resume /path/to/checkpoint.pth --output_dir /path/to/save --data-path /path/to/imagenet --batch-size 64 

To visualize a single image, run:

python3 visualize.py --model evo_deit_small_vis_patch16_224 --resume /path/to/checkpoint.pth --output_dir /path/to/save --img-path ./imgs/a.jpg --save-name evo_test

Add parameter '--layer-wise-prune' if the visualized model is not trained with layer-to-stage training strategy.

The visualization results of Evo-ViT-S are as follows:

result

Citation

If you find our work useful in your research, please consider citing:

@article{xu2021evo,
  title={Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer},
  author={Xu, Yifan and Zhang, Zhijie and Zhang, Mengdan and Sheng, Kekai and Li, Ke and Dong, Weiming and Zhang, Liqing and Xu, Changsheng and Sun, Xing},
  journal={arXiv preprint arXiv:2108.01390},
  year={2021}
}
Owner
YifanXu
But gold will glitter forever.
YifanXu
U-Net Brain Tumor Segmentation

U-Net Brain Tumor Segmentation 🚀 :Feb 2019 the data processing implementation in this repo is not the fastest way (code need update, contribution is

Hao 448 Jan 02, 2023
Official implementation of Deep Burst Super-Resolution

Deep-Burst-SR Official implementation of Deep Burst Super-Resolution Publication: Deep Burst Super-Resolution. Goutam Bhat, Martin Danelljan, Luc Van

Goutam Bhat 113 Dec 19, 2022
Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors,

24 Oct 26, 2022
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Attention Walk ⠀⠀ A PyTorch Implementation of Watch Your Step: Learning Node Embeddings via Graph Attention (NIPS 2018). Abstract Graph embedding meth

Benedek Rozemberczki 303 Dec 09, 2022
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

Van 21 Dec 30, 2022
The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

南嘉Nanga 36 Dec 21, 2022
This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Subreddit Analysis This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by

Georgios Chochlakis 1 Dec 17, 2021
A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

OutliersSlidingWindows A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows" Dataset generatio

PaoloPellizzoni 0 Jan 05, 2022
A crossplatform menu bar application using mpv as DLNA Media Renderer.

Macast Chinese README A menu bar application using mpv as DLNA Media Renderer. Install MacOS || Windows || Debian Download link: Macast release latest

4.4k Jan 01, 2023
Tensorflow implementation for Self-supervised Graph Learning for Recommendation

If the compilation is successful, the evaluator of cpp implementation will be called automatically. Otherwise, the evaluator of python implementation will be called.

152 Jan 07, 2023
A framework for multi-step probabilistic time-series/demand forecasting models

JointDemandForecasting.py A framework for multi-step probabilistic time-series/demand forecasting models File stucture JointDemandForecasting contains

Stanford Intelligent Systems Laboratory 3 Sep 28, 2022
Simulation of the solar system using various nummerical methods

solar-system Simulation of the solar system using various nummerical methods Download the repo Make shure matplotlib, scipy etc. are installed execute

Caspar 7 Jul 15, 2022
Latent Execution for Neural Program Synthesis

Latent Execution for Neural Program Synthesis This repo provides the code to replicate the experiments in the paper Xinyun Chen, Dawn Song, Yuandong T

Xinyun Chen 16 Oct 02, 2022
Kinetics-Data-Preprocessing

Kinetics-Data-Preprocessing Kinetics-400 and Kinetics-600 are common video recognition datasets used by popular video understanding projects like Slow

Kaihua Tang 7 Oct 27, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
Deep Crop Rotation

Deep Crop Rotation Paper (to come very soon!) We propose a deep learning approach to modelling both inter- and intra-annual patterns for parcel classi

Félix Quinton 5 Sep 23, 2022
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)

CM-NAS Official Pytorch code of paper CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification in ICCV2021. Vis

JDAI-CV 40 Nov 25, 2022
The code used for the free [email protected] Webinar series on Reinforcement Learning in Finance

Reinforcement Learning in Finance [email protected] Webinar This repository provides the code f

Yves Hilpisch 62 Dec 22, 2022
DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

DeepMetaHandles (CVPR2021 Oral) [paper] [animations] DeepMetaHandles is a shape deformation technique. It learns a set of meta-handles for each given

Liu Minghua 73 Dec 15, 2022
An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Pi Zero Bikecomputer An open-source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+ https://github.com/hishizuka/pizero_bikecompute

hishizuka 264 Jan 02, 2023