A Novel Plug-in Module for Fine-grained Visual Classification

Last update: Dec 20, 2022

Overview

A Novel Plug-in Module for Fine-grained Visual Classification

paper url: https://arxiv.org/abs/2202.03822

We propose a novel plug-in module that can be integrated to many common backbones, including CNN-based or Transformer-based networks to provide strongly discriminative regions. The plugin module can output pixel-level feature maps and fuse filtered features to enhance fine-grained visual classification. Experimental results show that the proposed plugin module outperforms state-ofthe-art approaches and significantly improves the accuracy to 92.77% and 92.83% on CUB200-2011 and NABirds, respectively.

1. Environment setting

install requirements
replace folder timm/ to our timm/ folder (for ViT or Swin-T)

Prepare dataset

In this paper, we use 2 large bird's datasets:

Our pretrained model

Download the pretrained model from this url: https://drive.google.com/drive/folders/1ivMJl4_EgE-EVU_5T8giQTwcNQ6RPtAo?usp=sharing

backup/ is our pretrained model path.
resnet50_miil_21k.pth and vit_base_patch16_224_miil_21k.pth are imagenet21k pretrained model (place these file under models/), thanks to https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/MODEL_ZOO.md !!

OS

Windows10
Ubuntu20.04
macOS

2. Train

configuration file: config.py

python train.py --train_root "./CUB200-2011/train/" --val_root "./CUB200-2011/test/"

3. Evaluation

configuration file: config_eval.py

python eval.py --pretrained_path "./backup/CUB200/best.pth" --val_root "./CUB200-2011/test/"

4. Visualization

configuration file: config_plot.py

python plot_heat.py --pretrained_path "./backup/CUB200/best.pth" --img_path "./img/001.png/"

Acknowledgment

Thanks to timm for Pytorch implementation.
This work was financially supported by the National Taiwan Normal University (NTNU) within the framework of the Higher Education Sprout Project by the Ministry of Education(MOE) in Taiwan, sponsored by Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 110- 2221-E-003-026, 110-2634-F-003 -007, and 110-2634-F-003 -006. In addition, we thank to National Center for Highperformance Computing (NCHC) for providing computational and storage resources.

A Novel Plug-in Module for Fine-grained Visual Classification

Related tags

Overview

A Novel Plug-in Module for Fine-grained Visual Classification

1. Environment setting

Prepare dataset

Our pretrained model

OS

2. Train

3. Evaluation

4. Visualization

Acknowledgment

Owner

ChouPoYung

Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

Audio Visual Emotion Recognition using TDA

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

This library is a location of the LegacyLogger for PyTorch Lightning.

MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

CTRL-C: Camera calibration TRansformer with Line-Classification

A framework for the elicitation, specification, formalization and understanding of requirements.

A cross-lingual COVID-19 fake news dataset

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

Koç University deep learning framework.

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

To prepare an image processing model to classify the type of disaster based on the image dataset

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

NOMAD - A blackbox optimization software

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time