Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Overview

Less is More: Pay Less Attention in Vision Transformers

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

By Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu and Jianfei Cai.

In our paper, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that convolutions, fully-connected (FC) layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. LIT uses pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner.

If you use this code for a paper please cite:

@article{pan2021less,
  title={Less is More: Pay Less Attention in Vision Transformers},
  author={Pan, Zizheng and Zhuang, Bohan and He, Haoyu and Liu, Jing and Cai, Jianfei},
  journal={arXiv preprint arXiv:2105.14217},
  year={2021}
}

Usage

First, clone this repository.

git clone https://github.com/MonashAI/LIT

Next, create a conda virtual environment.

# Make sure you have a NVIDIA GPU.
cd LIT/
bash setup_env.sh [conda_install_path] [env_name]

# For example
bash setup_env.sh /home/anaconda3 lit

Note: We use PyTorch 1.7.1 with CUDA 10.1 for all experiments. The setup_env.sh has illustrated all dependencies we used in our experiments. You may want to edit this file to install a different version of PyTorch or any other packages.

Data Preparation

Download the ImageNet 2012 dataset from here, and prepare the dataset based on this script. The file structure should look like:

imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Model Zoo

We provide baseline LIT models pretrained on ImageNet 2012.

Name Params (M) FLOPs (G) Top-1 Acc. (%) Model Log
LIT-Ti 19 3.6 81.1 google drive/github log
LIT-S 27 4.1 81.5 google drive/github log
LIT-M 48 8.6 83.0 google drive/github log
LIT-B 86 15.0 83.4 google drive/github log

Training and Evaluation

In our implementation, we have different training strategies for LIT-Ti and other LIT models. Therefore, we provide two codebases.

For LIT-Ti, please refer to code_for_lit_ti.

For LIT-S, LIT-M, LIT-B, please refer to code_for_lit_s_m_b.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

This repository has adopted codes from DeiT, PVT and Swin, we thank the authors for their open-sourced code.

You might also like...
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT This repository is the official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. ArXiv If

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer
The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

ELSA: Enhanced Local Self-Attention for Vision Transformer By Jingkai Zhou, Pich

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

SE3 Transformer - Pytorch Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. May be needed for replicating Alphafold2 resu

Pytorch implementation for  our ICCV 2021 paper
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers
PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Comments
  • Problem about DCN

    Problem about DCN

    I have some problems about compiling DCN, thus it is hard for me to use the DCN.deform_conv2d_forward and DCN.deform_conv2d_backward functions. Can 'deform_conv2d_naive' be used instead of this part? Or is there other methods for me to accomplish this DCN.deform_conv2d part?

    opened by jarygrace 3
  • How to use LITNet as a beckbone of object detection

    How to use LITNet as a beckbone of object detection

    Could you please release the code of using LITNet as a beckbone of RetinaNet as mentioned in your paper? I wonder how to use it as a beckbone for object detection...

    opened by sunhuisunhui 2
Generate image analogies using neural matching and blending

neural image analogies This is basically an implementation of this "Image Analogies" paper, In our case, we use feature maps from VGG16. The patch mat

Adam Wentz 3.5k Jan 08, 2023
Steerable discovery of neural audio effects

Steerable discovery of neural audio effects Christian J. Steinmetz and Joshua D. Reiss Abstract Applications of deep learning for audio effects often

Christian J. Steinmetz 182 Dec 29, 2022
Art Project "Schrödinger's Game of Life"

Repo of the project "Team Creative Quantum AI: Schrödinger's Game of Life" Installation new conda env: conda create --name qcml python=3.8 conda activ

ℍ◮ℕℕ◭ℍ ℝ∈ᛔ∈ℝ 2 Sep 15, 2022
Hierarchical Motion Encoder-Decoder Network for Trajectory Forecasting (HMNet)

Hierarchical Motion Encoder-Decoder Network for Trajectory Forecasting (HMNet) Our paper: https://arxiv.org/abs/2111.13324 We will release the complet

15 Oct 17, 2022
Aws-machine-learning-university-accelerated-tab - Machine Learning University: Accelerated Tabular Data Class

Machine Learning University: Accelerated Tabular Data Class This repository contains slides, notebooks, and datasets for the Machine Learning Universi

AWS Samples 916 Dec 23, 2022
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
A simple interface for editing natural photos with generative neural networks.

Neural Photo Editor A simple interface for editing natural photos with generative neural networks. This repository contains code for the paper "Neural

Andy Brock 2.1k Dec 29, 2022
Bytedance Inc. 2.5k Jan 06, 2023
A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

Karoo GP Karoo GP is an evolutionary algorithm, a genetic programming application suite written in Python which supports both symbolic regression and

Kai Staats 149 Jan 09, 2023
Parris, the automated infrastructure setup tool for machine learning algorithms.

README Parris, the automated infrastructure setup tool for machine learning algorithms. What Is This Tool? Parris is a tool for automating the trainin

Joseph Greene 319 Aug 02, 2022
HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton Wencan Cheng, Jae Hyun Park, Jong

cwc1260 23 Oct 21, 2022
Code accompanying our NeurIPS 2021 traffic4cast challenge

Traffic forecasting on traffic movie snippets This repo contains all code to reproduce our approach to the IARAI Traffic4cast 2021 challenge. In the c

Nina Wiedemann 2 Aug 09, 2022
IPATool-py: download ipa easily

IPATool-py Python version of IPATool! Installation pip3 install -r requirements.txt Usage Quickstart: download app with specific bundleId into DIR: p

159 Dec 30, 2022
Pytorch implementation of NEGEV method. Paper: "Negative Evidence Matters in Interpretable Histology Image Classification".

Pytorch 1.10.0 code for: Negative Evidence Matters in Interpretable Histology Image Classification (https://arxiv. org/abs/xxxx.xxxxx) Citation: @arti

Soufiane Belharbi 4 Dec 01, 2022
Linear algebra python - Number of operations and problems in Linear Algebra and Numerical Linear Algebra

Linear algebra in python Number of operations and problems in Linear Algebra and

Alireza 5 Oct 09, 2022
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 7.7k Jan 05, 2023
Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

YOLaT-VectorGraphicsRecognition This repository is the official PyTorch implementation of our NeurIPS-2021 paper: Recognizing Vector Graphics without

Microsoft 49 Dec 20, 2022
an Evolutionary Algorithm assisted GAN

EvoGAN an Evolutionary Algorithm assisted GAN ckpts

3 Oct 09, 2022