The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Last update: Nov 29, 2022

Related tags

Overview

Shuffle Transformer

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Introduction

Very recently, window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. Shuffle Transformer revisit the spatial shuffle as an efficient way to build connections among windows, which is highly efficient and easy to implement by modifying two lines of code. Furthermore, the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections. The proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification, object detection, and semantic segmentation.

Requirements

PyTorch==1.7.1
torchvision==0.8.2
timm==0.3.2

The Apex is optional for faster training speed.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Other Requirements

pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8
pip install einops

Main Results

Results on ImageNet-1K

name	[email protected]	#params	FLOPs	Throughputs(Images/s)	Weights
Shuffle-T	82.4	28M	4.6G	791	google drive
Shuffle-S	83.6	50M	8.9G	450	google drive
Shuffle-B	84.0	88M	15.6	279	google drive

Usage

For classification on ImageNet-1K, to train from scratch, run:

python -m torch.distributed.launch --nproc_per_node   main.py \ 
--cfg  --data-path  [--batch-size  --output ]

To evaluate, run:

python -m torch.distributed.launch --nproc_per_node  main.py --eval \
--cfg  --resume  --data-path

In progress

Semantic Segmentation
Instance Segmentation

Citing Shuffle Transformer

@article{huang2021shuffle,
 title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer},
 author={Huang, Zilong and Ben, Youcheng and Luo, Guozhong and Cheng, Pei and Yu, Gang and Fu, Bin},
 journal={arXiv preprint arXiv:2106.03650},
 year={2021}
}

Acknowledgement

Thanks to open-source implementation of Swin-Transformer.

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Related tags

Overview

Shuffle Transformer

Introduction

Requirements

Main Results

Results on ImageNet-1K

Usage

In progress

Citing Shuffle Transformer

Acknowledgement

Owner

Improving Deep Network Debuggability via Sparse Decision Layers

Project page for our ICCV 2021 paper "The Way to my Heart is through Contrastive Learning"

MPI-IS Mesh Processing Library

Deep Learning Theory

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

A PaddlePaddle implementation of Time Interval Aware Self-Attentive Sequential Recommendation.

A Fast Knowledge Distillation Framework for Visual Recognition

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

This repository contains the code used for the implementation of the paper "Probabilistic Regression with HuberDistributions"

Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K

cl;asification problem using classification models in supervised learning

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

基于Paddle框架的arcface复现

Automatic deep learning for image classification.

Convert Table data to approximate values with GUI

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

WatermarkRemoval-WDNet-WACV2021