DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Last update: Dec 21, 2022

Related tags

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	[email protected]
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

Owner

CASIA-IVA-Lab

《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

The versatile ocean simulator, in pure Python, powered by JAX.

Implementation of the Remixer Block from the Remixer paper, in Pytorch

neural image generation

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Video Frame Interpolation without Temporal Priors (a general method for blurry video interpolation)

Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

mlpack: a scalable C++ machine learning library --

Implementation of light baking system for ray tracing based on Activision's UberBake

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Simple-Image-Classification - Simple Image Classification Code (PyTorch)

The official implementation of CircleNet: Anchor-free Detection with Circle Representation, MICCAI 2030

Object recognition using Azure Custom Vision AI and Azure Functions

Python program that works as a contact list

A facial recognition doorbell system using a Raspberry Pi

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

A trusty face recognition research platform developed by Tencent Youtu Lab

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer