Vision Transformer with Deformable Attention

This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv].

Introduction

Deformable attention is proposed to model the relations among tokens effectively under the guidance of the important regions in the feature maps. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer (DAT), a general backbone model with deformable attention for both image classification and other dense prediction tasks.

Dependencies

NVIDIA GPU + CUDA 11.1
Python 3.8 (Recommend to use Anaconda)
PyTorch == 1.8.0
timm
einops
yacs
termcolor

TODO

Classification pretrained models.
Object Detection codebase & models.
Semantic Segmentation codebase & models.
CUDA operators to accelerate sampling operations.

Acknowledgement

This code is developed on the top of Swin Transformer, we thank to their efficient and neat codebase.

Citation

If you find our work is useful in your research, please consider citing:

@misc{xia2022vision,
      title={Vision Transformer with Deformable Attention}, 
      author={Zhuofan Xia and Xuran Pan and Shiji Song and Li Erran Li and Gao Huang},
      year={2022},
      eprint={2201.00520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

[email protected]

Repository of Vision Transformer with Deformable Attention

Related tags

Overview

Vision Transformer with Deformable Attention

Introduction

Dependencies

TODO

Acknowledgement

Citation

Contact

Owner

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Official implementation of Long-Short Transformer in PyTorch.

Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.

Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation（Unfinished）

Easy genetic ancestry predictions in Python

Official Implementation of "Tracking Grow-Finish Pigs Across Large Pens Using Multiple Cameras"

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

Social Network Ads Prediction

KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

Pytorch implementations of Bayes By Backprop, MC Dropout, SGLD, the Local Reparametrization Trick, KF-Laplace, SG-HMC and more

Compare outputs between layers written in Tensorflow and layers written in Pytorch

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

Code for MSc Quantitative Finance Dissertation

FastFace: Lightweight Face Detection Framework

Balancing Principle for Unsupervised Domain Adaptation

This project intends to use SVM supervised learning to determine whether or not an individual is diabetic given certain attributes.

CPU inference engine that delivers unprecedented performance for sparse models