This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Last update: Jan 03, 2023

Related tags

Deep Learning fine-grained-recognition

Overview

TransFG: A Transformer Architecture for Fine-grained Recognition

Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-grained Recognition

Implementation based on DeiT pretrained on ImageNet-1K with distillation fine-tuning will be released soon.

Framework

Dependencies:

Python 3.7.3
PyTorch 1.5.1
torchvision 0.6.1
ml_collections

Usage

1. Download Google pre-trained ViT models

Get models in this link: ViT-B_16, ViT-B_32...

wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npz

2. Prepare data

In the paper, we use data from 5 publicly available datasets:

Please download them from the official websites and put them in the corresponding folders.

3. Install required packages

Install dependencies with the following command:

pip3 install -r requirements.txt

4. Train

To train TransFG on CUB-200-2011 dataset with 4 gpus in FP-16 mode for 10000 steps run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --fp16 --name sample_run

Citation

If you find our work helpful in your research, please cite it as:

@article{he2021transfg,
  title={TransFG: A Transformer Architecture for Fine-grained Recognition},
  author={He, Ju and Chen, Jieneng and Liu, Shuai and Kortylewski, Adam and Yang, Cheng and Bai, Yutong and Wang, Changhu and Yuille, Alan},
  journal={arXiv preprint arXiv:2103.07976},
  year={2021}
}

Acknowledgement

Many thanks to ViT-pytorch for the PyTorch reimplementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Related tags

Overview

TransFG: A Transformer Architecture for Fine-grained Recognition

Framework

Dependencies:

Usage

1. Download Google pre-trained ViT models

2. Prepare data

3. Install required packages

4. Train

Citation

Acknowledgement

Owner

Ju He

Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)"

Dynamic Capacity Networks using Tensorflow

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

"Segmenter: Transformer for Semantic Segmentation" reproduced via mmsegmentation

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms (TIP2020, Matlab codes)

ULMFiT for Genomic Sequence Data

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task

PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Boston House Prediction Valuation Tool

Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay

R3Det based on mmdet 2.19.0

Most popular metrics used to evaluate object detection algorithms.

The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population