The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Last update: Nov 29, 2022

Related tags

Overview

Shuffle Transformer

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Introduction

Very recently, window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. Shuffle Transformer revisit the spatial shuffle as an efficient way to build connections among windows, which is highly efficient and easy to implement by modifying two lines of code. Furthermore, the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections. The proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification, object detection, and semantic segmentation.

Requirements

PyTorch==1.7.1
torchvision==0.8.2
timm==0.3.2

The Apex is optional for faster training speed.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Other Requirements

pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8
pip install einops

Main Results

Results on ImageNet-1K

name	[email protected]	#params	FLOPs	Throughputs(Images/s)	Weights
Shuffle-T	82.4	28M	4.6G	791	google drive
Shuffle-S	83.6	50M	8.9G	450	google drive
Shuffle-B	84.0	88M	15.6	279	google drive

Usage

For classification on ImageNet-1K, to train from scratch, run:

python -m torch.distributed.launch --nproc_per_node   main.py \ 
--cfg  --data-path  [--batch-size  --output ]

To evaluate, run:

python -m torch.distributed.launch --nproc_per_node  main.py --eval \
--cfg  --resume  --data-path

In progress

Semantic Segmentation
Instance Segmentation

Citing Shuffle Transformer

@article{huang2021shuffle,
 title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer},
 author={Huang, Zilong and Ben, Youcheng and Luo, Guozhong and Cheng, Pei and Yu, Gang and Fu, Bin},
 journal={arXiv preprint arXiv:2106.03650},
 year={2021}
}

Acknowledgement

Thanks to open-source implementation of Swin-Transformer.

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Related tags

Overview

Shuffle Transformer

Introduction

Requirements

Main Results

Results on ImageNet-1K

Usage

In progress

Citing Shuffle Transformer

Acknowledgement

Owner

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

SAMO: Streaming Architecture Mapping Optimisation

State-of-the-art language models can match human performance on many tasks

Focal and Global Knowledge Distillation for Detectors

Multiwavelets-based operator model

PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"

Code to reproduce the results for Compositional Attention

A library that allows for inference on probabilistic models

The codebase for Data-driven general-purpose voice activity detection.

CVNets: A library for training computer vision networks

Reaction SMILES-AA mapping via language modelling

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

A Python package for generating concise, high-quality summaries of a probability distribution

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

Repository for open research on optimizers.

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"