Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

Overview

UPT: Unary–Pairwise Transformers

PWC PWC

This repository contains the official PyTorch implementation for the paper

Frederic Z. Zhang, Dylan Campbell and Stephen Gould. Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer. arXiv preprint arXiv:2112.01838.

[project page] [preprint]

Abstract

...
However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary–Pairwise Transformer, a two-stage detector that exploits unary and pairwise representa-tions for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

Demonstration on data in the wild

Model Zoo

We provide weights for UPT models pre-trained on HICO-DET and V-COCO for potential downstream applications. In addition, we also provide weights for fine-tuned DETR models to facilitate reproducibility. To attempt fine-tuning the DETR model yourself, refer to this repository.

Model Dataset Default Settings Inference UPT Weights DETR Weights
UPT-R50 HICO-DET (31.66, 25.94, 33.36) 0.042s weights weights
UPT-R101 HICO-DET (32.31, 28.55, 33.44) 0.061s weights weights
UPT-R101-DC5 HICO-DET (32.62, 28.62, 33.81) 0.124s weights weights
Model Dataset Scenario 1 Scenario 2 Inference UPT Weights DETR Weights
UPT-R50 V-COCO 59.0 64.5 0.043s weights weights
UPT-R101 V-COCO 60.7 66.2 0.064s weights weights
UPT-R101-DC5 V-COCO 61.3 67.1 0.131s weights weights

The inference speed was benchmarked on a GeForce RTX 3090. Note that weights of the UPT model include those of the detector (DETR). You do not need to download the DETR weights, unless you want to train the UPT model from scratch. Training UPT-R50 with 8 GeForce GTX TITAN X GPUs takes around 5 hours on HICO-DET and 40 minutes on V-COCO, almost a tenth of the time compared to other one-stage models such as QPIC.

Contact

For general inquiries regarding the paper and code, please post them in Discussions. For bug reports and feature requests, please post them in Issues. You can also contact me at [email protected].

Prerequisites

  1. Install the lightweight deep learning library Pocket. The recommended PyTorch version is 1.9.0.
  2. Download the repository and the submodules.
git clone https://github.com/fredzzhang/upt.git
git submodule init
git submodule update
  1. Prepare the HICO-DET dataset.
    1. If you have not downloaded the dataset before, run the following script.
    cd /path/to/upt/hicodet
    bash download.sh
    1. If you have previously downloaded the dataset, simply create a soft link.
    cd /path/to/upt/hicodet
    ln -s /path/to/hicodet_20160224_det ./hico_20160224_det
  2. Prepare the V-COCO dataset (contained in MS COCO).
    1. If you have not downloaded the dataset before, run the following script
    cd /path/to/upt/vcoco
    bash download.sh
    1. If you have previously downloaded the dataset, simply create a soft link
    cd /path/to/upt/vcoco
    ln -s /path/to/coco ./mscoco2014

License

UPT is released under the BSD-3-Clause License.

Inference

We have implemented inference utilities with different visualisation options. Provided you have downloaded the model weights to checkpoints/, run the following command to visualise detected instances together with the attention maps from the cooperative and competitive layers. Use the flag --index to select images, and --box-score-thresh to modify the filtering threshold on object boxes.

python inference.py --resume checkpoints/upt-r50-hicodet.pt --index 8789

Here is the sample output. Note that we manually selected some informative attention maps to display. The predicted scores for each action will be printed by the script as well.

To select the V-COCO dataset and V-COCO models, use the flag --dataset vcoco, and then load the corresponding weights. To visualise interactive human-object pairs for a particular action class, use the flag --action to specify the action index. Here is a lookup table for the action indices.

Additionally, to cater for different needs, we implemented an option to run inference on custom images, using the flag --image-path. The following is an example for interaction holding an umbrella.

python inference.py --resume checkpoints/upt-r50-hicodet.pt --image-path ./assets/umbrella.jpeg --action 36

Training and Testing

Refer to launch_template.sh for training and testing commands with different options. To train the UPT model from scratch, you need to download the weights for the corresponding DETR model, and place them under /path/to/upt/checkpoints/. Adjust --world-size based on the number of GPUs available.

To test the UPT model on HICO-DET, you can either use the Python utilities we implemented or the Matlab utilities provided by Chao et al.. For V-COCO, we did not implement evaluation utilities, and instead use the utilities provided by Gupta et al.. Refer to these instructions for more details.

Citation

If you find our work useful for your research, please consider citing us:

@article{zhang2021upt,
  author    = {Frederic Z. Zhang and Dylan Campbell and Stephen Gould},
  title     = {Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer},
  journal   = {arXiv preprint arXiv:2112.01838},
  year      = {2021}
}

@inproceedings{zhang2021scg,
  author    = {Frederic Z. Zhang, Dylan Campbell and Stephen Gould},
  title     = {Spatially Conditioned Graphs for Detecting Human–Object Interactions},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2021},
  pages     = {13319-13327}
}
Comments
  • error when test vcoco

    error when test vcoco

    I use python main.py --cache --dataset vcoco --data-root vcoco/ --partitions trainval test --output-dir vcoco-r50 --resume checkpoints/upt-r50-vcoco.pt to generate cache.pkl. But report a error when eval it.

    The eval code is:

    from vsrl_eval import VCOCOeval
    
    vsrl_annot_file = 'data/vcoco/vcoco_val.json'
    coco_file = 'data/instances_vcoco_all_2014.json'
    split_file = 'data/splits/vcoco_val.ids'
    
    vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
    
    det_file = '/media/ming-t/Deng/relation_mppe/HOI-UPT/vcoco-r50/cache.pkl'
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)
    

    The error is:

    loading annotations into memory...
    Done (t=0.74s)
    creating index...
    index created!
    loading vcoco annotations...
    Traceback (most recent call last):
      File "test.py", line 14, in <module>
        vcocoeval._do_eval(det_file, ovr_thresh=0.5)
      File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 194, in _do_eval
        self._do_agent_eval(vcocodb, detections_file, ovr_thresh=ovr_thresh)
      File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 417, in _do_agent_eval
        assert(np.amax(rec) <= 1)
      File "<__array_function__ internals>", line 180, in amax
      File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2793, in amax
        return _wrapreduction(a, np.maximum, 'max', axis, None, out,
      File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
        return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
    ValueError: zero-size array to reduction operation maximum which has no identity
    

    How to solve it?

    opened by leijue222 14
  • The HOI loss is NaN for rank 0

    The HOI loss is NaN for rank 0

    Dir sir, I followed with readme to build this UPT network,but when i use the instruction python main.py --world-size 1 --dataset vcoco --data-root ./v-coco --partitions trainval test --pretrained ../detr-r50-vcoco.pth --output-dir ./upt-r50-vcoco.pt

    i got an error

    `Traceback (most recent call last): File "main.py", line 208, in mp.spawn(main, nprocs=args.world_size, args=(args,)) File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/root/autodl-tmp/upload/main.py", line 125, in main engine(args.epochs) File "/root/pocket/pocket/pocket/core/distributed.py", line 139, in call self._on_each_iteration() File "/root/autodl-tmp/upload/utils.py", line 138, in _on_each_iteration raise ValueError(f"The HOI loss is NaN for rank {self._rank}") ValueError: The HOI loss is NaN for rank 0`

    I tried to train without pretrain model it works the same error.I tried to print the loss but it shown an empty tensor.As a beginner , i have no idea what it happened.If you could give me any help,i would be appreciated. I look forward to receiving your reply.Thank you for a lot.

    Inactive 
    opened by OBVIOUSDAWN 11
  • Generate the results on the friends.gif

    Generate the results on the friends.gif

    Hello! Thank you for this amazing work! I am curious to know how you got the inference results showing the names of the objects and the activities on the demo_friends.gif. Can you please tell how you achieved that? Thanks in advance.

    opened by Andre1998Shuvam 8
  • Predicted object class instead of just number?

    Predicted object class instead of just number?

    Hi,

    thanks for the amazing work! I would like to ask where can we see the predicted object class of each bounding box, or the prediction result for a triplet form? Thank you!

    enhancement 
    opened by xiaoxiaoczw 7
  • code bug?

    code bug?

    opened by ltttpku 6
  • There is still a problem.

    There is still a problem.

    Traceback (most recent call last): File "inference.py", line 225, in main(args) File "C:\Users\User\anaconda3\envs\colab\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "inference.py", line 150, in main upt = build_detector(args, conversion) File "C:\Users\User\PycharmProjects\hoi\UPT\upt.py", line 276, in build_detector detr.backbone[0].num_channels, File "C:\Users\User\anaconda3\envs\colab\lib\site-packages\torch\nn\modules\module.py", line 1207, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'DETRsegm' object has no attribute 'backbone'


    I changed the torch version and tried it in the collab environment, but the problem still occurs in the same place.

    If possible, can you tell me all libraries using "pip freeze > requirements.txt"?

    If it is not possible to disclose it externally, I would appreciate it if you could send it to [email protected].

    opened by ghzmwhdk777 4
  • Here is problem

    Here is problem

    Traceback (most recent call last): File "inference.py", line 225, in main(args) File "C:\Users\User\anaconda3\envs\colab\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "inference.py", line 150, in main upt = build_detector(args, conversion) File "C:\Users\User\PycharmProjects\hoi\UPT\upt.py", line 268, in build_detector detr, , postprocessors = build_model(args) File "C:\Users\User\PycharmProjects\hoi\UPT\detr\models_init.py", line 6, in build_model return build(args) File "C:\Users\User\PycharmProjects\hoi\UPT\detr\models\detr.py", line 313, in build num_classes = 20 if args.dataset_file != 'coco' else 91 AttributeError: 'Namespace' object has no attribute 'dataset_file'

    opened by ghzmwhdk777 4
  • How about training a DETR model on the VCOCO dataset

    How about training a DETR model on the VCOCO dataset

    Thank you for your excellent work. you have provided a tutorial on training DETR models on the HICO-DET dataset, could you tell us how you trained the DETR on the VCOCO dataset?

    moved to discussion 
    opened by ddwhzh 4
  • confused about the vcoco dataset

    confused about the vcoco dataset

    There're some cool properties of VCOCO dataset you implemented: "object_to_action" gives me the list of actions for each object, i.e. {1: [0, 3, 11, 15], 2: [0, 1, 2, 3, 11], ......} "objects" return the list of objects, i.e. ['background', 'person', 'bicycle', .......] "actions" return the list of actions, i.e. ['hold obj', 'sit instr', 'ride instr', .......]

    However, I'm confused about the relationships among them:

    1. Which object does the key 1 of "1: [0, 3, 11, 15]", which is the first item of object_to_action, represent?
    2. Which action does the values [0, 3, 11, 15] of "1: [0, 3, 11, 15]" represent?

    According to the List of actions and objects, Actions 0, 3, 11, 15 represent hold obj, look obj, carry obj, cut obj respectively while Object 1 represent person, which appears to be weird.

    question moved to discussion 
    opened by ltttpku 3
  • train

    train

    python main.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/upt-r50-hicodet

    raise ValueError(f"The HOI loss is NaN for rank {self._rank}") ValueError: The HOI loss is NaN for rank 0

    opened by wangjunbianqiang 2
  • Multiple loss training code

    Multiple loss training code

    Hi, @fredzzhang :

    I want to try training with multiple losses. I found the relevant code. I added a loss, which is running and no error is reported.

    but I want to successfully train multiple loss and set the hyperparameters of loss, how do I do it?

    if self.training:

            interaction_loss = self.compute_interaction_loss(boxes, bh, bo, logits, prior, targets, pairwise_tokens_x_collated)
            interaction_x_loss = self.compute_interaction_x_loss(boxes, bh, bo, logits, prior, targets, pairwise_tokens_x_collated)
            loss_dict = dict(
                interaction_loss=interaction_loss,
                interaction_x_loss = interaction_x_loss
            )
            return loss_dict
    

    def _on_each_iteration(self):

        loss_dict = self._state.net(
            *self._state.inputs, targets=self._state.targets)
        if loss_dict['interaction_loss'].isnan():
            raise ValueError(f"The HOI loss is NaN for rank {self._rank}")
    
        self._state.loss = sum(loss for loss in loss_dict.values())
        self._state.optimizer.zero_grad(set_to_none=True)
        self._state.loss.backward()
        if self.max_norm > 0:
            torch.nn.utils.clip_grad_norm_(self._state.net.parameters(), self.max_norm)
        self._state.optimizer.step()
    

    yaoyaosanqi.

    opened by yaoyaosanqi 2
Releases(v1.0)
Owner
Frederic Zhang
PhD researcher, photographer, substandard musician but a linguistic genius
Frederic Zhang
Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis

HAABSAStar Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis". This project builds on the code from https://gith

1 Sep 14, 2020
meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

LancoPKU 107 Nov 18, 2022
なりすまし検出(anti-spoof-mn3)のWebカメラ向けデモ

FaceDetection-Anti-Spoof-Demo なりすまし検出(anti-spoof-mn3)のWebカメラ向けデモです。 モデルはPINTO_model_zoo/191_anti-spoof-mn3からONNX形式のモデルを使用しています。 Requirement mediapipe

KazuhitoTakahashi 8 Nov 18, 2022
TICC is a python solver for efficiently segmenting and clustering a multivariate time series

TICC TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regulariz

406 Dec 12, 2022
Fuzzer for Linux Kernel Drivers

difuze: Fuzzer for Linux Kernel Drivers This repo contains all the sources (including setup scripts), you need to get difuze up and running. Tested on

seclab 344 Dec 27, 2022
Google Recaptcha solver.

byerecaptcha - Google Recaptcha solver. Model and some codes takes from embium's repository -Installation- pip install byerecaptcha -How to use- from

Vladislav Zenkevich 21 Dec 19, 2022
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Unsupervised Phone and Word Segmentation using Vector-Quantized Neural Networks Overview Unsupervised phone and word segmentation on speech data is pe

Herman Kamper 13 Dec 11, 2022
Distilled coarse part of LoFTR adapted for compatibility with TensorRT and embedded divices

Coarse LoFTR TRT Google Colab demo notebook This project provides a deep learning model for the Local Feature Matching for two images that can be used

Kirill 46 Dec 24, 2022
This is a computer vision based implementation of the popular childhood game 'Hand Cricket/Odd or Even' in python

Hand Cricket Table of Content Overview Installation Game rules Project Details Future scope Overview This is a computer vision based implementation of

Abhinav R Nayak 6 Jan 12, 2022
Some methods for comparing network representations in deep learning and neuroscience.

Generalized Shape Metrics on Neural Representations In neuroscience and in deep learning, quantifying the (dis)similarity of neural representations ac

Alex Williams 45 Dec 27, 2022
Local-Global Stratified Transformer for Efficient Video Recognition

DualFormer This repo is the implementation of our manuscript entitled "Local-Global Stratified Transformer for Efficient Video Recognition". Our model

Sea AI Lab 19 Dec 07, 2022
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
salabim - discrete event simulation in Python

Object oriented discrete event simulation and animation in Python. Includes process control features, resources, queues, monitors. statistical distrib

181 Dec 21, 2022
MagFace: A Universal Representation for Face Recognition and Quality Assessment

MagFace MagFace: A Universal Representation for Face Recognition and Quality Assessment in IEEE Conference on Computer Vision and Pattern Recognition

Qiang Meng 523 Jan 05, 2023
C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion By: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedal

Meta Research 309 Dec 16, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
Preparation material for Dropbox interviews

Dropbox-Onsite-Interviews A guide for the Dropbox onsite interview! The Dropbox interview question bank is very small. The bank has been in a Chinese

386 Dec 31, 2022
Python program that works as a contact list

Lista de Contatos Programa em Python que funciona como uma lista de contatos. Features Adicionar novo contato Remover contato Atualizar contato Pesqui

Victor B. Lino 3 Dec 16, 2021
Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning. Users or developers can query Unicorn for a performance task.

AISys Lab 27 Jan 05, 2023
Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!

EarthGAN - Earth Mantle Surrogate Modeling Can a surrogate model of the Earth’s Mantle Convection data set be built such that it can be readily run in

Tim 0 Dec 09, 2021