This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Last update: Dec 30, 2022

Overview

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

04/12/2021 Initial commits

Results and Models

Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	3x	46.0	41.6	48M	267G	config	github/baidu	github/baidu
Swin-S	ImageNet-1K	3x	48.5	43.3	69M	359G	config	github/baidu	github/baidu

Cascade Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	3x	50.4	43.7	86M	745G	config	github/baidu	github/baidu
Swin-S	ImageNet-1K	3x	51.9	45.0	107M	838G	config	github/baidu	github/baidu
Swin-B	ImageNet-1K	3x	51.9	45.0	145M	982G	config	github/baidu	github/baidu

RepPoints V2

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs
Swin-T	ImageNet-1K	3x	50.0	-	45M	283G

Mask RepPoints V2

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs
Swin-T	ImageNet-1K	3x	50.3	43.6	47M	292G

Notes:

Pre-trained models can be downloaded from Swin Transformer for ImageNet Classification.
Access code for baidu is swin.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

For example, to train a Cascade Mask R-CNN model with a Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL>

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Related tags

Overview

Swin Transformer for Object Detection

Updates

Results and Models

Mask R-CNN

Cascade Mask R-CNN

RepPoints V2

Mask RepPoints V2

Usage

Installation

Inference

Training

Apex (optional):

Citing Swin Transformer

Other Links

Owner

Swin Transformer

an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

Course content and resources for the AIAIART course.

Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

A curated (most recent) list of resources for Learning with Noisy Labels

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

The AugNet Python module contains functions for the fast computation of image similarity.

Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

Toolchain to build Yoshi's Island from source code

Code and data for "TURL: Table Understanding through Representation Learning"

Get started learning C# with C# notebooks powered by .NET Interactive and VS Code.

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Predictive Maintenance LSTM

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

Code release of paper Improving neural implicit surfaces geometry with patch warping

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation