SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

Overview

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

PDF

Figure

Abstract

Explainable artificial intelligence has been gaining attention in the past few years. However, most existing methods are based on gradients or intermediate features, which are not directly involved in the decision-making process of the classifier. In this paper, we propose a slot attention-based classifier called SCOUTER for transparent yet accurate classification. Two major differences from other attention-based methods include: (a) SCOUTER's explanation is involved in the final confidence for each category, offering more intuitive interpretation, and (b) all the categories have their corresponding positive or negative explanation, which tells "why the image is of a certain category" or "why the image is not of a certain category." We design a new loss tailored for SCOUTER that controls the model's behavior to switch between positive and negative explanations, as well as the size of explanatory regions. Experimental results show that SCOUTER can give better visual explanations while keeping good accuracy on small and medium-sized datasets.

Model Structure

Structure Figure

SCOUTER is built on top of the recently-emerged slot attention, which offers an object-centric approach for image representation. Based on this approach, we propose an explainable slot attention (xSlot) module. The output from the xSlot module is directly used as the confidence values for each category and thus commonly used fully-connected (FC) layer-based classifiers are no longer necessary. The whole network, including the backbone, is trained with the SCOUTER loss, which provides control over the size of explanatory regions and switching between positive and negative explanations.

Usage

Enable distributed training (if desired)
python -m torch.distributed.launch --nproc_per_node=4 --use_env train.py --world_size 4

Imagenet

Training for Imagenet dataset (Base Model)
python train.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot false \
--vis false --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Positive Scouter for Imagenet dataset
python train.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status 1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis false --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Negative Scouter for Imagenet dataset
python train.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis false --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Visualization of Positive Scouter for Imagenet dataset
python test.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status 1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis true --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Visualization of Negative Scouter for Imagenet dataset
python test.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis true --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Visualization using torchcam for Imagenet dataset
python torchcam_vis.py --dataset ImageNet --model resnest26d --batch_size 70 \
--num_classes 10 --grad true --use_pre true \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/ \
--grad_min_level 0

MNIST Dataset

Pre-training for MNIST dataset
python train.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot false --vis false --aug false
Positive Scouter for MNIST dataset
python train.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre true --loss_status 1 --slots_per_class 1 \
--power 1 --to_k_layer 1 --lambda_value 1. --vis false --channel 512 --aug false
Negative Scouter for MNIST dataset
python train.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 2 \
--power 2 --to_k_layer 1 --lambda_value 1.5 --vis false --channel 512 --aug false --freeze_layers 3
Visualization of Positive Scouter for MNIST dataset
python test.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre true --loss_status 1 --slots_per_class 1 \
--power 1 --to_k_layer 1 --lambda_value 1. --vis true --channel 512 --aug false
Visualization of Negative Scouter for MNIST dataset
python test.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 2 \
--power 2 --to_k_layer 1 --lambda_value 1.5 --vis true --channel 512 --aug false --freeze_layers 3
Visualization using torchcam for MNIST dataset
python torchcam_vis.py --dataset MNIST --model resnet18 --batch_size 64 \
--num_classes 10 --grad true --use_pre true

Con-Text Dataset

Pre-training for ConText dataset
python train.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot false --vis false \
--dataset_dir ../data/con-text/JPEGImages/
Positive Scouter for ConText dataset
python train.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status 1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value .2 --vis false --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Negative Scouter for ConText dataset
python train.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis false --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Visualization of Positive Scouter for ConText dataset
python test.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status 1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis true --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Visualization of Negative Scouter for ConText dataset
python test.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis true --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Visualization using torchcam for ConText dataset
python torchcam_vis.py --dataset ConText --model resnest26d --batch_size 200 \
--num_classes 30 --grad true --use_pre true \
--dataset_dir ../data/con-text/JPEGImages/

CUB-200 Dataset

Pre-training for CUB-200 dataset
python train.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot false --vis false --channel 2048 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Positive Scouter for CUB-200 dataset
python train.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status 1 --slots_per_class 5 \
--power 2 --to_k_layer 3 --lambda_value 10 --vis false --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Negative Scouter for CUB-200 dataset
python train.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis false --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Visualization of Positive Scouter for CUB-200 dataset
python test.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status 1 --slots_per_class 5 \
--power 2 --to_k_layer 3 --lambda_value 10 --vis true --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Visualization of Negative Scouter for CUB-200 dataset
python test.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis true --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Visualization using torchcam for CUB-200 dataset
python torchcam_vis.py --dataset CUB200 --model resnest50d --batch_size 150 \
--num_classes 25 --grad true --use_pre true \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/

Acknowledgements

This work was supported by Council for Science, Technology and Innovation (CSTI), cross-ministerial Strategic Innovation Promotion Program (SIP), "Innovative AI Hospital System" (Funding Agency: National Institute of Biomedical Innovation, Health and Nutrition (NIBIOHN)).

Publication

If you want to use this work, please consider citing the following paper.

@inproceedings{li2021scouter,
 author = {Liangzhi Li and Bowen Wang and Manisha Verma and Yuta Nakashima and Ryo Kawasaki and Hajime Nagahara},
 booktitle = {IEEE International Conference on Computer Vision (ICCV)},
 pages = {},
 title = {SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition},
 year = {2021}
}
Owner
Bowen Wang
The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

10 Oct 21, 2022
An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive

Kushal Shingote 1 Feb 12, 2022
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022
Qrcode Attendence System with Opencv and Pyzbar

Setup process Creates a virtual environment (Scripts that ensure executed Python code uses the Python interpreter and site packages installed inside t

Ganesh 5 Aug 01, 2022
In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Virtual Mouse Using OpenCV In this project we will be using the live feed coming from the webcam to create a virtual mouse using hand tracking. Projec

Hassan Shahzad 8 Dec 20, 2022
Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

JIGSAW: An unstructured mesh generator JIGSAW is an unstructured mesh generator and tessellation library; designed to generate high-quality triangulat

Darren Engwirda 26 Dec 13, 2022
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition Released the code of RepMLP together with an example o

260 Jan 03, 2023
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 06, 2023
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
A bot that extract text from images using the Tesseract OCR.

Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I

Weverton Marques 4 Aug 06, 2021
Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 02, 2023
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

46 Jun 08, 2022
Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

LayoutAnalysisEvaluator Layout Analysis Evaluator for: ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records ICD

17 Dec 08, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
Recognizing the text contents from a scanned visiting card

Recognizing the text contents from a scanned visiting card. The application which is used to recognize the text from scanned images,printeddocuments,r

Faizan Habib 1 Jan 28, 2022
The world's simplest facial recognition api for Python and the command line

Face Recognition You can also read a translated version of this file in Chinese 简体中文版 or in Korean 한국어 or in Japanese 日本語. Recognize and manipulate fa

Adam Geitgey 47k Jan 07, 2023
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021