QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Related tags

Deep LearningQAHOI
Overview

QAHOI

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Requirements

  • PyTorch >= 1.5.1
  • torchvision >= 0.6.1
pip install -r requirements.txt
  • Compiling CUDA operators
cd ./models/ops
sh ./make.sh
# test
python test.py

Dataset Preparation

Please follow the HICO-DET dataset preparation of GGNet.

After preparation, the data folder as follows:

data
├── hico_20160224_det
|   ├── images
|   |   ├── test2015
|   |   └── train2015
|   └── annotations
|       ├── anno_list.json
|       ├── corre_hico.npy
|       ├── file_name_to_obj_cat.json
|       ├── hoi_id_to_num.json
|       ├── hoi_list_new.json
|       ├── test_hico.json
|       └── trainval_hico.json

Evaluation

Download the model to params folder.

  • We test the model with NVIDIA A6000 GPU, Pytorch 1.9.0, Python 3.8 and CUDA 11.2.
Model Full (def) Rare (def) None-Rare (def) Full (ko) Rare (ko) None-Rare (ko) Download
Swin-Tiny 28.47 22.44 30.27 30.99 24.83 32.84 model
Swin-Base*+ 33.58 25.86 35.88 35.34 27.24 37.76 model
Swin-Large*+ 35.78 29.80 37.56 37.59 31.36 39.36 model

Evaluating the model by running the following command.

--eval_extra to evaluate the spatio contribution.

mAP_default.json and mAP_ko.json will save in current folder.

  • Swin-Tiny
python main.py --resume params/QAHOI_swin_tiny_mul3.pth --backbone swin_tiny --num_feature_levels 3 --use_nms --eval
  • Swin-Base*+
python main.py --resume params/QAHOI_swin_base_384_22k_mul3.pth --backbone swin_base_384 --num_feature_levels 3 --use_nms --eval
  • Swin-Large*+
python main.py --resume params/QAHOI_swin_large_384_22k_mul3.pth --backbone swin_large_384 --num_feature_levels 3 --use_nms --eval

Training

Download the pre-trained swin-tiny model from Swin-Transformer to params folder.

Training QAHOI with Swin-Tiny from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrained params/swin_tiny_patch4_window7_224.pth \
        --output_dir logs/swin_tiny_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Base*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_base_384 \
        --pretrained params/swin_base_patch4_window7_224_22k.pth \
        --output_dir logs/swin_base_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Large*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_large_384 \
        --pretrained params/swin_large_patch4_window12_384_22k.pth \
        --output_dir logs/swin_large_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Citation

@article{cjw,
  title={QAHOI: Query-Based Anchors for Human-Object Interaction Detection},
  author={Junwen Chen and Keiji Yanai},
  journal={arXiv preprint arXiv:2112.08647},
  year={2021}
}
This is an early in-development version of training CLIP models with hivemind.

A transformer that does not hog your GPU memory This is an early in-development codebase: if you want a stable and documented hivemind codebase, look

<a href=[email protected]"> 4 Nov 06, 2022
codes for IKM (arXiv2021, Submitted to IEEE Trans)

Image-specific Convolutional Kernel Modulation for Single Image Super-resolution This repository is for IKM introduced in the following paper Yuanfei

Yuanfei Huang 9 Dec 29, 2022
Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

spatial-intention-maps This code release accompanies the following paper: Spatial Intention Maps for Multi-Agent Mobile Manipulation Jimmy Wu, Xingyua

Jimmy Wu 70 Jan 02, 2023
Tree Nested PyTorch Tensor Lib

DI-treetensor treetensor is a generalized tree-based tensor structure mainly developed by OpenDILab Contributors. Almost all the operation can be supp

OpenDILab 167 Dec 29, 2022
Code accompanying the paper "Knowledge Base Completion Meets Transfer Learning"

Knowledge Base Completion Meets Transfer Learning This code accompanies the paper Knowledge Base Completion Meets Transfer Learning published at EMNLP

14 Nov 27, 2022
Recovering Brain Structure Network Using Functional Connectivity

Recovering-Brain-Structure-Network-Using-Functional-Connectivity Framework: Papers: This repository provides a PyTorch implementation of the models ad

5 Nov 30, 2022
CLASP - Contrastive Language-Aminoacid Sequence Pretraining

CLASP - Contrastive Language-Aminoacid Sequence Pretraining Repository for creating models pretrained on language and aminoacid sequences similar to C

Michael Pieler 133 Dec 29, 2022
A TensorFlow implementation of Neural Program Synthesis from Diverse Demonstration Videos

ViZDoom http://vizdoom.cs.put.edu.pl ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is pri

Hyeonwoo Noh 1 Aug 19, 2020
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it.

MFD-ILP Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it. The solvers are implemented using Pytho

Algorithmic Bioinformatics Group @ University of Helsinki 4 Oct 23, 2022
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

RuiLiu 65 Dec 20, 2022
Official code repository for the publication "Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons"

Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons This repository contains the code to repr

Computational Neuroscience, University of Bern 3 Aug 04, 2022
Scalable Optical Flow-based Image Montaging and Alignment

SOFIMA SOFIMA (Scalable Optical Flow-based Image Montaging and Alignment) is a tool for stitching, aligning and warping large 2d, 3d and 4d microscopy

Google Research 16 Dec 21, 2022
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions

A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions Kapoutsis, A.C., Chatzichristofis,

Athanasios Ch. Kapoutsis 5 Oct 15, 2022
Code for the paper "Next Generation Reservoir Computing"

Next Generation Reservoir Computing This is the code for the results and figures in our paper "Next Generation Reservoir Computing". They are written

OSU QuantInfo Lab 105 Dec 20, 2022
Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering

Nvdiffrast – Modular Primitives for High-Performance Differentiable Rendering Modular Primitives for High-Performance Differentiable Rendering Samuli

NVIDIA Research Projects 675 Jan 06, 2023
Toolbox to analyze temporal context invariance of deep neural networks

PyTCI A toolbox that estimates the integration window of a sensory response using the "Temporal Context Invariance" paradigm (TCI). The TCI method Int

4 Oct 23, 2022
Locally Constrained Self-Attentive Sequential Recommendation

LOCKER This is the pytorch implementation of this paper: Locally Constrained Self-Attentive Sequential Recommendation. Zhankui He, Handong Zhao, Zhe L

Zhankui (Aaron) He 8 Jul 30, 2022
A repo for Causal Imitation Learning under Temporally Correlated Noise

CausIL A repo for Causal Imitation Learning under Temporally Correlated Noise. Running Experiments To re-train an expert, run: python experts/train_ex

Gokul Swamy 5 Nov 01, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 09, 2023