Vision Transformer with Deformable Attention

This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv].

Introduction

Deformable attention is proposed to model the relations among tokens effectively under the guidance of the important regions in the feature maps. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer (DAT), a general backbone model with deformable attention for both image classification and other dense prediction tasks.

Dependencies

NVIDIA GPU + CUDA 11.1
Python 3.8 (Recommend to use Anaconda)
PyTorch == 1.8.0
timm
einops
yacs
termcolor

TODO

Classification pretrained models.
Object Detection codebase & models.
Semantic Segmentation codebase & models.
CUDA operators to accelerate sampling operations.

Acknowledgement

This code is developed on the top of Swin Transformer, we thank to their efficient and neat codebase.

Citation

If you find our work is useful in your research, please consider citing:

@misc{xia2022vision,
      title={Vision Transformer with Deformable Attention}, 
      author={Zhuofan Xia and Xuran Pan and Shiji Song and Li Erran Li and Gao Huang},
      year={2022},
      eprint={2201.00520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

[email protected]

Repository of Vision Transformer with Deformable Attention

Related tags

Overview

Vision Transformer with Deformable Attention

Introduction

Dependencies

TODO

Acknowledgement

Citation

Contact

Owner

Pytorch implementation of Compressive Transformers, from Deepmind

PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

Fast sparse deep learning on CPUs

PyTorch Connectomics: segmentation toolbox for EM connectomics

Table-Extractor 表格抽取

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

Visualizer for neural network, deep learning, and machine learning models

Cross-Modal Contrastive Learning for Text-to-Image Generation

This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

mmdetection version of TinyBenchmark.

Subnet Replacement Attack: Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Deep learning with dynamic computation graphs in TensorFlow

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Code for "Multi-Compound Transformer for Accurate Biomedical Image Segmentation"

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth

A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.

Implementation of SwinTransformerV2 in TensorFlow.