DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Last update: Dec 21, 2022

Related tags

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	[email protected]
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

Owner

CASIA-IVA-Lab

We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

Pytorch implementation of Supporting Clustering with Contrastive Learning, NAACL 2021

An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.

Code for Max-Margin Contrastive Learning - AAAI 2022

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Tensorflow implementation of MIRNet for Low-light image enhancement

SARS-Cov-2 Recombinant Finder for fasta sequences

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

PyTorch implementation for 3D human pose estimation

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

PyTorch implementation of DeepLab v2 on COCO-Stuff / PASCAL VOC

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Naszilla is a Python library for neural architecture search (NAS)

Code base for the paper "Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation"

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A Python library for unevenly-spaced time series analysis

Visual dialog agents with pre-trained vision-and-language encoders.