Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Related tags

Text Data & NLPevit
Overview

Expediting Vision Transformers via Token Reorganizations

This repository contains PyTorch evaluation code, training code and pretrained EViT models for the ICLR 2022 Spotlight paper:

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

The proposed EViT models obtain competitive tradeoffs in terms of speed / precision:

EViT

If you use this code for a paper please cite:

@inproceedings{liang2022evit,
title={Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations},
author={Youwei Liang and Chongjian Ge and Zhan Tong and Yibing Song and Jue Wang and Pengtao Xie},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=BjyvwnXXVn_}
}

Model Zoo

We provide EViT-DeiT-S models pretrained on ImageNet 2012.

Token fusion Keep rate [email protected] [email protected] #Params URL
0.9 79.8 95.0 22.1M model
0.8 79.8 94.9 22.1M model
0.7 79.5 94.8 22.1M model
0.6 78.9 94.5 22.1M model
0.5 78.5 94.2 22.1M model
0.9 79.9 94.9 22.1M model
0.8 79.7 94.8 22.1M model
0.7 79.4 94.7 22.1M model
0.6 79.1 94.5 22.1M model
0.5 78.4 94.1 22.1M model

Preparation

The reported results in the paper were obtained with models trained with 16 NVIDIA A100 GPUs using Python3.6 and the following packages

torch==1.9.0
torchvision==0.10.0
timm==0.4.12
tensorboardX==2.4
torchprofile==0.0.4
lmdb==1.2.1
pyarrow==5.0.0

These packages can be installed by running pip install -r requirements.txt.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

We use the same datasets as in DeiT. You can optionally use an LMDB dataset for ImageNet by building it using folder2lmdb.py and passing --use-lmdb to main.py, which may speed up data loading.

Usage

First, clone the repository locally:

git clone https://github.com/youweiliang/evit.git

Change directory to the cloned repository by running cd evit, install necessary packages, and prepare the datasets.

Training

To train EViT/0.7-DeiT-S on ImageNet, set the datapath (path to dataset) and logdir (logging directory) in run_code.sh properly and run bash ./run_code.sh (--nproc_per_node should be modified if necessary). Note that the batch size in the paper is 16x128=2048.

Set --base_keep_rate in run_code.sh to use a different keep rate, and set --fuse_token to configure whether to use inattentive token fusion.

Training/Finetuning on higher resolution images

To training on images with a (higher) resolution h, set --input-size h in run_code.sh.

Multinode training

Please refer to DeiT for multinode training.

Finetuning

First set the datapath, logdir, and ckpt (the model checkpoint for finetuning) in run_code.sh, and then run bash ./finetune.sh.

Evaluation

To evaluate a pre-trained EViT/0.7-DeiT-S model on ImageNet val with a single GPU run (replacing checkpoint with the actual file):

python3 main.py --model deit_small_patch16_shrink_base --fuse_token --base_keep_rate 0.7 --eval --resume checkpoint --data-path /path/to/imagenet

You can also pass --dist-eval to use multiple GPUs for evaluation.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

We would like to think the authors of DeiT, based on which this project is built.

Owner
Youwei Liang
Youwei Liang
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

EleutherAI 96 Dec 21, 2022
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 408 Dec 29, 2022
Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

Dennis Priskorn 9 Nov 17, 2022
Predict an emoji that is associated with a text

Sentiment Analysis Sentiment analysis in computational linguistics is a general term for techniques that quantify sentiment or mood in a text. Can you

Tetsumichi(Telly) Umada 30 Sep 07, 2022
CoNLL-English NER Task (NER in English)

CoNLL-English NER Task en | ch Motivation Course Project review the pytorch framework and sequence-labeling task practice using the transformers of Hu

Kevin 2 Jan 14, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 04, 2022
A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

tfds-korean A collection of Korean Text Datasets ready to use using Tensorflow-Datasets. TensorFlow-Datasets를 이용한 한국어/한글 데이터셋 모음입니다. Dataset Catalog |

Jeong Ukjae 20 Jul 11, 2022
Help you discover excellent English projects and get rid of disturbing by other spoken language

GitHub English Top Charts 「Help you discover excellent English projects and get

GrowingGit 544 Jan 09, 2023
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 06, 2023
CoSENT、STS、SentenceBERT

CoSENT_Pytorch 比Sentence-BERT更有效的句向量方案

102 Dec 07, 2022
SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering. Contents Inst

0 Oct 21, 2021
Repository for Project Insight: NLP as a Service

Project Insight NLP as a Service Contents Introduction Features Installation Setup and Documentation Project Details Demonstration Directory Details H

Abhishek Kumar Mishra 286 Dec 06, 2022
Just a Basic like Language for Zeno INC

zeno-basic-language Just a Basic like Language for Zeno INC This is written in 100% python. this is basic language like language. so its not for big p

Voidy Devleoper 1 Dec 18, 2021
Code for the paper "Are Sixteen Heads Really Better than One?"

Are Sixteen Heads Really Better than One? This repository contains code to reproduce the experiments in our paper Are Sixteen Heads Really Better than

Paul Michel 143 Dec 14, 2022
The official repository of the ISBI 2022 KNIGHT Challenge

KNIGHT The official repository holding the data for the ISBI 2022 KNIGHT Challenge About The KNIGHT Challenge asks teams to develop models to classify

Nicholas Heller 4 Jan 22, 2022
The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

Sergio Arnaud Gomez 2 Jan 28, 2022
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

Phil Wang 32 Oct 14, 2022
This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

Sachit Yadav 1 Feb 11, 2022