Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Related tags

Deep Learningqpic
Overview

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

by Masato Tamura, Hiroki Ohashi, and Tomoaki Yoshinaga.

This repository contains the official implementation of the paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information", which is accepted to CVPR2021.

QPIC is implemented by extending the recently proposed object detector, DETR. QPIC leverages the query-based detection and attention mechanism in the transformer, and as a result, achieves high HOI detection performance with simple detection heads.

Example attention maps.

Preparation

Dependencies

Our implementation uses external libraries such as NumPy and PyTorch. You can resolve the dependencies with the following command.

pip install numpy
pip install -r requirements.txt

Note that this command may dump errors during installing pycocotools, but the errors can be ignored.

Dataset

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

qpic
 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

qpic
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained parameters

Our QPIC have to be pre-trained with the COCO object detection dataset. For the HICO-DET training, this pre-training can be omitted by using the parameters of DETR. The parameters can be downloaded from here for ResNet50, and for ResNet101. For the V-COCO training, this pre-training has to be carried out because some images of the V-COCO evaluation set are contained in the training set of DETR. We excluded the images and pre-trained QPIC for the V-COCO evaluation.

After downloading or pre-training, move the pre-trained parameters to the params directory and convert the parameters with the following command (e.g. downloaded ResNet50 parameters).

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre.pth

Training

After the preparation, you can start the training with the following command.

For the HICO-DET training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

For the V-COCO training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 80 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

If you have multiple GPUs on your machine, you can utilize them to speed up the training. The number of GPUs is specified with the --nproc_per_node option. The following command starts the training with 8 GPUs for the HICO-DET training.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

Evaluation

The evaluation is conducted at the end of each epoch during the training. The results are written in logs/log.txt like below:

"test_mAP": 0.29061250833779456, "test_mAP rare": 0.21910348492395765, "test_mAP non-rare": 0.31197234650036926

test_mAP, test_mAP rare, and test_mAP non-rare are the results of the default full, rare, and non-rare setting, respectively.

For the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file as follows.

python generate_vcoco_official.py \
        --param_path logs/checkpoint.pth
        --save_path vcoco.pickle
        --hoi_path data/v-coco

Results

HICO-DET.

Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO)
QPIC (ResNet50) 29.07 21.85 31.23 31.68 24.14 33.93
QPIC (ResNet101) 29.90 23.92 31.69 32.38 26.06 34.27

D: Default, KO: Known object

V-COCO.

Scenario 1 Scenario 2
QPIC (ResNet50) 58.8 61.0
QPIC (ResNet101) 58.3 60.7

Citation

Please consider citing our paper if it helps your research.

@inproceedings{tamura_cvpr2021,
author = {Tamura, Masato and Ohashi, Hiroki and Yoshinaga, Tomoaki},
title = {{QPIC}: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information},
booktitle={CVPR},
year = {2021},
}
Comments
  • Reproduction the result on VCOCO dataset

    Reproduction the result on VCOCO dataset

    Hi, I can reproduce the result on HICO-DET dataset, but just get 51 on V-COCO dataset, could you please provide the log.txt? I am not sure where I am wrong. Thanks.

    opened by SherlockHolmes221 7
  • How to get the file logs/checkpoint.pth?

    How to get the file logs/checkpoint.pth?

    Thanks for your interesting work! Now I have a question about how to get the file logs/checkpoint.pth,I have not see the file from your google drive. Desire to your reply.

    opened by Lunarli 4
  • number of decoder layers impact

    number of decoder layers impact

    Hi, thanks for interesting work. Does the number of decoder layers have a significant impact on performance like in detr? And I cant find how much v100 you've used. maybe 8?

    opened by jihwanp 3
  • Why the pretrained VCOCO model has 81 object classes?

    Why the pretrained VCOCO model has 81 object classes?

    When I tried to evaluate the VCOCO model you provide, I have to set the parameter --num_obj_classes to 81. What is the reason for this setting? And should I set --num_obj_classes 81 during training?

    Thanks for your reply!

    opened by weiyunfei 3
  • How to pretrain detector for VCOCO?

    How to pretrain detector for VCOCO?

    Hi, thanks for your excellent work. I am confused about the pre-training for the V-COCO training. In your README.md, you stated the pretraining has to be carried out for the V-COCO training since some images of the V-COCO evaluation set are contained in the training set of DETR, while the given training command just used the pretrained DETR model without any more pretraining. Should I apply a pretraining on the dataset excluding the V-COCO evaluation images or just follow your training command?

    opened by weiyunfei 3
  • Different AP Results for pre-trained VCOCO models

    Different AP Results for pre-trained VCOCO models

    Hi folks,

    First of all, thank you for sharing this repository. I would like to ask a specific question about the evaluation results of provided pre-trained V-COCO models. I followed the instructions you provided (for constructing annotation files for V-COCO and obtaining pickle files) to get the results. However, comparing with the V-COCO results in the table, I got different average role ap results. For instance, I am providing the output of R-50 QPIC, scenario 1 Role AP result in the attached screenshot. I am wondering possible reasons for the issue, could you please provide assistance about this?

    qpic_resnet50_sc1_roleap

    opened by cancam 2
  • How to convert the pretrained DETR model for V-COCO style?

    How to convert the pretrained DETR model for V-COCO style?

    Hi! Sorry for bothering again. When I tried to train a V-COCO model, I found that the object classifier of the pre-trained DETR model is only 81-way (including background), while in #6 you claimed that the V-COCO model has an 82-way object classifier (including background and a missing category id). So what should I do to convert the classifier parameters from 81 classes into 82 classes?

    opened by weiyunfei 2
  • For V-COCO result reproduction, which evaluation code should I use?

    For V-COCO result reproduction, which evaluation code should I use?

    I tried to reproduce your results on V-COCO, and I encountered some problems. I found that the official evaluation code can only achieve 56.5 for Scenario 1 and 58.6 for Scenario 2 with your pre-trained model, while with the vcoco_eval.py in your project which is in PPDM style, the result is 58.35. However, the evaluation process in vcoco_eval.py seems like Scenario 2 in the official evaluation. Which code should I use to reproduce your experiments on V-COCO? Moreover, if I need to use the vcoco_eval.py, what should I do to get the evaluation results for both scenarios?

    opened by weiyunfei 2
  • V-COCO Evaluation Error

    V-COCO Evaluation Error

    Hello,

    Many thanks for your great work.

    I am trying to evaluate your pre-trained models on V-COCO.

    1. So, I first generate the official detection pickle via:
    
    python generate_vcoco_official.py \
            --param_path ./params/qpic_resnet50_vcoco.pth \
            --save_path ./logs_vcoco/vcoco.pickle \
            --hoi_path ./data/v-coco
    
    1. Later, I use the official evaluation code via the following:
    from vsrl_eval import VCOCOeval
    
    vsrl_annot_file_s='./data/v-coco/data/vcoco/vcoco_val.json'
    split_file_s='./data/v-coco/data/splits/vcoco_val.ids'
    
    coco_file_s='./data/v-coco/data/instances_vcoco_all_2014.json'
    vcocoeval = VCOCOeval(vsrl_annot_file_s, coco_file_s, split_file_s)
    
    file_name= './logs_vcoco/vcoco.pickle'
    vcocoeval._do_eval(file_name, ovr_thresh=0.5)
    

    Please note that I adapted the latter script from VSG-Net repo. I face the following error during evaluation:

    zero-size array to reduction operation minimum which has no identity

    on assert(np.amax(rec) <= 1) within _do_agent_eval() function execution.

    I wonder if this is a common error, and how it can be mitigated?

    Many thanks.

    opened by kilickaya 1
  • Distance AP

    Distance AP

    Hi I have a question about distance-wise ap. image

    Does the hoi instance, which participated in calculating ap, correspond to distance=0.1 when the distance between human & center box is between 0.1 and 0.2?

    Or I would fully appreciate it if you could provide code for calculate distance wise ap. Thanks

    opened by jihwanp 1
  • Some useless steps in focal loss

    Some useless steps in focal loss

    image Thx for your interesting work! The neg_weights is meaningful only in heatmap based problem, e.g., Gaussian based detection. Removing those steps may avoid some misunderstanding.

    opened by YueLiao 1
  • a bug of hico dataset file?

    a bug of hico dataset file?

    Firstly,thank for your nice work?but i find small bug here https://github.com/hitachi-rd-cv/qpic/blob/main/datasets/hico.py#L62 The number of HOI turples(len(img_anno['hoi_annotation'])) should not exceed the number of query(self.num_queries) but the number of boxes(len(img_anno['annotations'])) should not exceed the number of query(self.num_queries)? Is that right? Waiting for your thought,thank you very much!!

    opened by jojolee123 0
  • How to convert .onnx model from qpic.pth

    How to convert .onnx model from qpic.pth

    I am facing some issue; onnx conversion of this model. I tried

    torch.onnx.export(pth_model, dummy_input, "onnx_model.onnx", opset_version 11) dummy_input shape is [1, 3, 720, 1280] # same with this model.

    However, the result is [array([[[nan, nan, nan], [nan, nan, nan]]], dtype=float32), array([[[nan, nan], [nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32)]

    It's nan party!

    please commet how to fix it.

    opened by bj-noh 0
  • Is your architecture end-to-end HOI?

    Is your architecture end-to-end HOI?

    Hello Thanks for your implementation. Is your architecture end-to-end HOI? Does it mean that it does not require any features and feature extraction? For example, I can feed my features to your network?

    opened by mansooreh1 0
  • The influence from aux_loss

    The influence from aux_loss

    Hi authors,

    thanks for your implementation which helps my research a lot. In the supplementary materials of your paper you stated that the auxiliary loss is used following DETR. What will happen if it isn't used for HOI training? I wonder if there's any corresponding experiment on QPIC.

    opened by hwfan 0
  • custom dataset implementation

    custom dataset implementation

    Hi, Thank you so much for this awesome repo. I am currently trying a custom dataset implementation using qpic. The dataset's annotation file is in coco format. The convert_vcoco_annotations.py script converts vcoco to HOIA format. But I am trying to convert a coco format annotation to HOIA by manually adding the interactions. In this process, I am unable to understand a few things, in HOIA format : what does {subject_id , category_id, object_id} mean in "hoi_annotations" dict key, what does {category_id} mean in "annotations" dict key"

    opened by dharneeshkar004 0
N-RPG - Novel role playing game da turfu

N-RPG Ce README sera la page de garde du projet. Contenu Il contiendra la présen

4 Mar 15, 2022
A novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance.

appearance-scanner About This repository is an implementation of the neural network proposed in Free-form Scanning of Non-planar Appearance with Neura

Xiaohe Ma 14 Oct 18, 2022
Brain tumor detection using CNN (InceptionResNetV2 Model)

Brain-Tumor-Detection Building a detection model using a convolutional neural network in Tensorflow & Keras. Used brain MRI images. InceptionResNetV2

1 Feb 13, 2022
A system for quickly generating training data with weak supervision

Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat

Snorkel Team 5.4k Jan 02, 2023
Create animations for the optimization trajectory of neural nets

Animating the Optimization Trajectory of Neural Nets loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscap

Logan Yang 81 Dec 25, 2022
Estimating Example Difficulty using Variance of Gradients

Estimating Example Difficulty using Variance of Gradients This repository contains source code necessary to reproduce some of the main results in the

Chirag Agarwal 48 Dec 26, 2022
My coursework for Machine Learning (2021 Spring) at National Taiwan University (NTU)

Machine Learning 2021 Machine Learning (NTU EE 5184, Spring 2021) Instructor: Hung-yi Lee Course Website : (https://speech.ee.ntu.edu.tw/~hylee/ml/202

100 Dec 26, 2022
Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV File YOLOv3 weight can be downloaded

Ngoc Quyen Ngo 2 Mar 27, 2022
Hierarchical Uniform Manifold Approximation and Projection

HUMAP Hierarchical Manifold Approximation and Projection (HUMAP) is a technique based on UMAP for hierarchical non-linear dimensionality reduction. HU

Wilson Estécio Marcílio Júnior 160 Jan 06, 2023
Apollo optimizer in tensorflow

Apollo Optimizer in Tensorflow 2.x Notes: Warmup is important with Apollo optimizer, so be sure to pass in a learning rate schedule vs. a constant lea

Evan Walters 1 Nov 09, 2021
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Codebase for learning control flow in transformers The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformer

Csordás Róbert 24 Oct 15, 2022
Jittor Medical Segmentation Lib -- The assignment of Pattern Recognition course (2021 Spring) in Tsinghua University

THU模式识别2021春 -- Jittor 医学图像分割 模型列表 本仓库收录了课程作业中同学们采用jittor框架实现的如下模型: UNet SegNet DeepLab V2 DANet EANet HarDNet及其改动HarDNet_alter PSPNet OCNet OCRNet DL

48 Dec 26, 2022
SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning

SPCL SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning Update on 2021/11/25: ArXiv Ver

Binhui Xie (谢斌辉) 11 Oct 29, 2022
A decent AI that solves daily Wordle puzzles. Works with different websites with similar wordlists,.

Wordle-AI A decent AI that solves daily "Wordle" puzzles. Works with different websites with similar wordlists. When prompted with "Word:" enter the w

Ethan 1 Feb 10, 2022
PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

36 Sep 13, 2022
Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

CGTransformer Code for our AAAI 2022 paper "Contrastive-Geometry Transformer network for Generalized 3D Pose Transfer" Contrastive-Geometry Transforme

18 Jun 28, 2022
SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

CORNELLSASLAB SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab Instructions: This python code can be used to convert SAS out

2 Jan 26, 2022
Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

Shilong Zhang 129 Dec 13, 2022
TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

TargetAllDomainObjects A python wrapper to run a command on against all users/co

Podalirius 19 Dec 13, 2022
Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

About Fuzzification Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-

gts3.org (<a href=[email protected])"> 55 Oct 25, 2022