Alignment Attention Fusion framework for Few-Shot Object Detection

Last update: Dec 16, 2022

Overview

AAF framework

Framework generalities

This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is to propose a flexible framework to implement various attention mechanisms for Few-Shot Object Detection. The framework is composed of 3 different modules: Spatial Alignment, Global Attention and Fusion Layer, which are applied successively to combine features from query and support images.

The inputs of the framework are:

query_features List[Tensor(B, C, H, W)]: Query features at different levels. For each level, the features are of shape Batch x Channels x Height x Width.
support_features List[Tensor(N, C, H', W')] : Support features at different level. First dimension correspond to the number of support images, regrouped by class: N = N_WAY * K_SHOT.
support_targets List[BoxList] bounding boxes for object in each support image.

The framework can be configured using a separate config file. Examples of such files are available under /config_files/aaf_framework/. The structure of these files is simple:

ALIGN_FIRST: #True/False Run Alignment before Attention when True
OUT_CH: # Number of features output by the fusion layer
ALIGNMENT:
    MODE: # Name of the alignment module selected
ATTENTION:
    MODE: # Name of the attention module selected
FUSION:
    MODE: # Name of the fusion module selected

File name	Method	Alignment	Attention	Fusion
`identity.yaml`	Identity	IDENTITY	IDENTITY	IDENTITY
`feature_reweighting.yaml`	FSOD via feature reweighting	IDENTITY	REWEIGHTING_BATCH	IDENTITY
`meta_faster_rcnn.yaml`	Meta Faster-RCNN	SIMILARITY_ALIGN	META_FASTER	META_FASTER
`self_adapt.yaml`	Self-adaptive attention for FSOD	IDENTITY_NO_REPEAT	GRU	IDENTITY
`dynamic.yaml`	Dynamic relevance learning	IDENTITY	INTERPOLATE	DYNAMIC_R
`dana.yaml`	Dual Awarness Attention for FSOD	CISA	BGA	HADAMARD

The path to the AAF config file should be specified inside the master config file (i.e. for the whole network) under FEWSHOT.AAF.CFG.

For each module, classes implementing the available choices are regrouped under a single file: /modelling/aaf/alignment.py, /modelling/aaf/attention.py and /modelling/aaf/fusion.py.

Spatial Alignment

Spatial Alignment reorganizes spatially the features of one feature map to match another one. The idea is to align similar features in both maps so that comparison is easier.

Name	Description
IDENTITY	Repeats the feature to match BNCHW and NBCHW dimensions
IDENTITY_NO_REPEAT	Identity without repetition
SIMILARITY_ALIGN	Compute similarity matrix between support and query and align support to query accordingly.
CISA	CISA block from this method

### Global Attention Global Attention highlights some features of a map accordingly to an attention vector computed globally on another one. The idea is to leverage global and hopefully semantic information.

Name	Description
IDENTITY	Simply pass features to next modules.
REWEIGHTING	Reweights query features using globally pooled vectors from support.
REWEIGHTING_BATCH	Same as above but support examples are the same for the whole batch.
SELF_ATTENTION	Same as above but attention vectors are computed from the alignment matrix between query and support.
BGA	BGA blocks from this method
META_FASTER	Attention block from this method
POOLING	Pools query and support features to the same size.
INTERPOLATE	Upsamples support features to match query size.
GRU	Computes attention vectors through a graph representation using a GRU.

Fusion Layer

Combine directly the features from support and query. These maps must be of the same dimension for point-wise operation. Hence fusion is often employed along with alignment.

Name	Description
IDENTITY	Returns onlu adapted query features.
ADD	Point-wise sum between query and support features.
HADAMARD	Point-wise multiplication between query and support features.
SUBSTRACT	Point-wise substraction between query and support features.
CONCAT	Channel concatenation of query and support features.
META_FASTER	Fusion layer from this method
DYNAMIC_R	Fusion layer from this method

Training and evaluation

Training and evaluation scripts are available.

TODO: Give code snippet to run training with a specified config file (modify main) Basically create 2 scripts train.py and eval.py with arg config file.

DataHandler

Explain DataHandler class a bit.

Installation

Dependencies used for this projects can be installed through conda create --name <env> --file requirements.txt. Please note that these requirements are not all necessary and it will be updated soon.

FCOS must be installed from sources. But there might be some issue after installation depending on the version of the python packages you use.

cpu/vision.h file not found: replace all occurences in the FCOS source by vision.h (see this issue).
Error related to AT_CHECK with pytorch > 1.5 : replace all occurences by TORCH_CHECK (see this issue.
Error related to torch._six.PY36: replace all occurence of PY36 by PY37.

Results

Results on pascal VOC, COCO and DOTA.

Alignment Attention Fusion framework for Few-Shot Object Detection

Related tags

Overview

AAF framework

Framework generalities

Spatial Alignment

Fusion Layer

Training and evaluation

DataHandler

Installation

Results

Owner

Pierre Le Jeune

Model Agnostic Interpretability for Multiple Instance Learning

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

Implementation of QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application

Residual Pathway Priors for Soft Equivariance Constraints

A toolkit for Lagrangian-based constrained optimization in Pytorch

YOLOv4-v3 Training Automation API for Linux

STBP is a way to train SNN with datasets by Backward propagation.

Simple Pixelbot for Diablo 2 Resurrected written in python and opencv.

Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces"

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

TensorFlow implementation of Deep Reinforcement Learning papers

Spatiotemporal resampling methods for mlr3

An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

🔪 Elimination based Lightweight Neural Net with Pretrained Weights

Java and SHACL code commented in the paper "Towards compliance checking in reified I/O logic via SHACL" submitted to ICAIL 2021

A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

ECLARE: Extreme Classification with Label Graph Correlations

Relaxed-machines - explorations in neuro-symbolic differentiable interpreters