"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Last update: Dec 06, 2022

Overview

FGVC8

Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Categorization on June 25th.

Abstract

Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks.

In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities.

We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.

Instructions

Upcoming

Citation

If you find interesting our results, or you use or code/ideas please consider to cite our work:

@misc{conde2021exploring,
      title={Exploring Vision Transformers for Fine-grained Classification}, 
      author={Marcos V. Conde and Kerem Turgutlu},
      year={2021},
      eprint={2106.10587},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Related tags

Overview

FGVC8

Abstract

Instructions

Citation

References

Owner

Marcos V. Conde

ArcaneGAN by Alex Spirin

Pytorch code for semantic segmentation using ERFNet

Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models.

A clean and robust Pytorch implementation of PPO on continuous action space.

Optical Character Recognition + Instance Segmentation for russian and english languages

Fully Convolutional DenseNet (A.K.A 100 layer tiramisu) for semantic segmentation of images implemented in TensorFlow.

a Lightweight library for sequential learning agents, including reinforcement learning

Code for our paper "Interactive Analysis of CNN Robustness"

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Randomized Correspondence Algorithm for Structural Image Editing

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Parameterized Explainer for Graph Neural Network

Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Semi-Supervised Learning with Ladder Networks in Keras. Get 98% test accuracy on MNIST with just 100 labeled examples !

A toy project using OpenCV and PyMunk

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

YOLOX-RMPOLY