[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

Related tags

Deep LearningTRACE
Overview

Target Adaptive Context Aggregation for Video Scene Graph Generation

This is a PyTorch implementation for Target Adaptive Context Aggregation for Video Scene Graph Generation.

Requirements

  • PyTorch >= 1.2 (Mine 1.7.1 (CUDA 10.1))
  • torchvision >= 0.4 (Mine 0.8.2 (CUDA 10.1))
  • cython
  • matplotlib
  • numpy
  • scipy
  • opencv
  • pyyaml
  • packaging
  • pycocotools
  • tensorboardX
  • tqdm
  • pillow
  • scikit-image
  • h5py
  • yacs
  • ninja
  • overrides
  • mmcv

Compilation

Compile the CUDA code in the Detectron submodule and in the repo:

# ROOT=path/to/cloned/repository
cd $ROOT/Detectron_pytorch/lib
sh make.sh
cd $ROOT/lib
sh make.sh

Data Preparation

Download Datasets

Download links: VidVRD and AG.

Create directories for datasets. The directories for ./data/ should look like:

|-- data
|   |-- ag
|   |-- vidvrd
|   |-- obj_embed

where ag and vidvrd are for AG and VidVRD datasets, and obj_embed is for GloVe, the weights of pre-trained word vectors. The final directories for GloVe should look like:

|-- obj_embed
|   |-- glove.6B.200d.pt
|   |-- glove.6B.300d.pt
|   |-- glove.6B.300d.txt
|   |-- glove.6B.200d.txt
|   |-- glove.6B.100d.txt
|   |-- glove.6B.50d.txt
|   |-- glove.6B.300d

AG

Put the .mp4 files into ./data/ag/videos/. Put the annotations into ./data/ag/annotations/.

The final directories for VidVRD dataset should look like:

|-- ag
|   |-- annotations
|   |   |-- object_classes.txt
|   |   |-- ...
|   |-- videos
|   |   |-- ....mp4
|   |-- Charades_annotations

VidVRD

Put the .mp4 files into ./data/vidvrd/videos/. Put the three documents test, train and videos from the vidvrd-annoataions into ./data/vidvrd/annotations/.

Download precomputed precomputed features, model and detected relations from here (or here). Extract features and models into ./data/vidvrd/.

The final directories for VidVRD dataset should look like:

|-- vidvrd
|   |-- annotations
|   |   |-- test
|   |   |-- train
|   |   |-- videos
|   |   |-- predicate.txt
|   |   |-- object.txt
|   |   |-- ...
|   |-- features
|   |   |-- relation
|   |   |-- traj_cls
|   |   |-- traj_cls_gt
|   |-- models
|   |   |-- baseline_setting.json
|   |   |-- ...
|   |-- videos
|   |   |-- ILSVRC2015_train_00005003.mp4
|   |   |-- ...

Change the format of annotations for AG and VidVRD

# ROOT=path/to/cloned/repository
cd $ROOT

python tools/rename_ag.py

python tools/rename_vidvrd_anno.py

python tools/get_vidvrd_pretrained_rois.py --out_rpath pre_processed_boxes_gt_dense_more --rpath traj_cls_gt

python tools/get_vidvrd_pretrained_rois.py --out_rpath pre_processed_boxes_dense_more

Dump frames

Our ffmpeg version is 4.2.2-0york0~16.04 so using --ignore_editlist to avoid some frames being ignored. The jpg format saves the drive space.

Dump the annotated frames for AG and VidVRD.

python tools/dump_frames.py --ignore_editlist

python tools/dump_frames.py --ignore_editlist --video_dir data/vidvrd/videos --frame_dir data/vidvrd/frames --frame_list_file val_fname_list.json,train_fname_list.json --annotation_dir data/vidvrd/annotations --st_id 0

Dump the sampled high quality frames for AG and VidVRD.

python tools/dump_frames.py --frame_dir data/ag/sampled_frames --ignore_editlist --frames_store_type jpg --high_quality --sampled_frames

python tools/dump_frames.py --ignore_editlist --video_dir data/vidvrd/videos --frame_dir data/vidvrd/sampled_frames --frame_list_file val_fname_list.json,train_fname_list.json --annotation_dir data/vidvrd/annotations --frames_store_type jpg --high_quality --sampled_frames --st_id 0

If you want to dump all frames with jpg format.

python tools/dump_frames.py --all_frames --frame_dir data/ag/all_frames --ignore_editlist --frames_store_type jpg

Get classes in json format for AG

# ROOT=path/to/cloned/repository
cd $ROOT
python txt2json.py

Get Charades train/test split for AG

Download Charades annotations and extract the annotations into ./data/ag/Charades_annotations/. Then run,

# ROOT=path/to/cloned/repository
cd $ROOT
python tools/dataset_split.py

Pretrained Models

Download model weights from here.

  • pretrained object detection
  • TRACE trained on VidVRD in detection_models/vidvrd/trained_rel
  • TRACE trained on AG in detection_models/ag/trained_rel

Performance

VidVrd, gt box

Method mAP [email protected] [email protected]
TRACE 30.6 19.3 24.6

gt_vidvrd

VidVrd, detected box

Method mAP [email protected] [email protected]
TRACE 16.3 9.2 11.2

det_vidvrd

AG, detected box

det_ag

Training Relationship Detection Models

VidVRD

# ROOT=path/to/cloned/repository
cd $ROOT

CUDA_VISIBLE_DEVICES=0 python tools/train_net_step_rel.py --dataset vidvrd --cfg configs/vidvrd/vidvrd_res101xi3d50_all_boxes_sample_train_flip_dc5_2d_new.yaml --nw 8 --use_tfboard --disp_interval 20 --o SGD --lr 0.025

AG

# ROOT=path/to/cloned/repository
cd $ROOT

CUDA_VISIBLE_DEVICES=0 python tools/train_net_step_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --nw 8 --use_tfboard --disp_interval 20 --o SGD --lr 0.01

Evaluating Relationship Detection Models

VidVRD

evaluation for gt boxes

CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 python tools/test_net_rel.py --dataset vidvrd --cfg configs/vidvrd/vidvrd_res101xi3d50_gt_boxes_dc5_2d_new.yaml --load_ckpt Outputs/vidvrd_res101xi3d50_all_boxes_sample_train_flip_dc5_2d_new/Aug01-16-20-06_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step12999.pth --output_dir Outputs/vidvrd_new101 --do_val --multi-gpu-testing

python tools/transform_vidvrd_results.py --input_dir Outputs/vidvrd_new101 --output_dir Outputs/vidvrd_new101 --is_gt_traj

python tools/test_vidvrd.py --prediction Outputs/vidvrd_new101/baseline_relation_prediction.json --groundtruth data/vidvrd/annotations/test_gt.json

evaluation for detected boxes

CUDA_VISIBLE_DEVICES=1 python tools/test_net_rel.py --dataset vidvrd --cfg configs/vidvrd/vidvrd_res101xi3d50_pred_boxes_flip_dc5_2d_new.yaml --load_ckpt Outputs/vidvrd_res101xi3d50_all_boxes_sample_train_flip_dc5_2d_new/Aug01-16-20-06_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step12999.pth --output_dir Outputs/vidvrd_new101_det2 --do_val

python tools/transform_vidvrd_results.py --input_dir Outputs/vidvrd_new101_det2 --output_dir Outputs/vidvrd_new101_det2

python tools/test_vidvrd.py --prediction Outputs/vidvrd_new101_det2/baseline_relation_prediction.json --groundtruth data/vidvrd/annotations/test_gt.json

AG

evaluation for detected boxes, Recalls (SGDet)

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val

#evaluation for detected boxes, mRecalls
python tools/visualize.py  --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --num 60000 --no_do_vis --rel_class_recall

evaluation for detected boxes, mAP_{rel}

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val --eva_map --topk 50

evaluation for gt boxes, Recalls (SGCls)

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val --use_gt_boxes

#evaluation for detected boxes, mRecalls
python tools/visualize.py  --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --num 60000 --no_do_vis --rel_class_recall

evaluation for gt boxes, gt object labels, Recalls (PredCls)

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val --use_gt_boxes --use_gt_labels

#evaluation for detected boxes, mRecalls
python tools/visualize.py  --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --num 60000 --no_do_vis --rel_class_recall

Hint

  • We apply the dilation convolution in I3D now, but observe a gridding effect in temporal feature maps.

Acknowledgements

This project is built on top of ContrastiveLosses4VRD, ActionGenome and VidVRD-helper. The corresponding papers are Graphical Contrastive Losses for Scene Graph Parsing, Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs and Video Visual Relation Detection.

Citing

If you use this code in your research, please use the following BibTeX entry.

@inproceedings{Target_Adaptive_Context_Aggregation_for_Video_Scene_Graph_Generation,
  author    = {Yao Teng and
               Limin Wang and
               Zhifeng Li and
               Gangshan Wu},
  title     = {Target Adaptive Context Aggregation for Video Scene Graph Generation},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages     = {13688--13697},
  year      = {2021}
}
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning Code for the paper Harmonious Textual Layout Generation over Nat

7 Aug 09, 2022
Activating More Pixels in Image Super-Resolution Transformer

HAT [Paper Link] Activating More Pixels in Image Super-Resolution Transformer Xiangyu Chen, Xintao Wang, Jiantao Zhou and Chao Dong BibTeX @article{ch

XyChen 270 Dec 27, 2022
performing moving objects segmentation using image processing techniques with opencv and numpy

Moving Objects Segmentation On this project I tried to perform moving objects segmentation using background subtraction technique. the introduced meth

Mohamed Magdy 15 Dec 12, 2022
Expert Finding in Legal Community Question Answering

Expert Finding in Legal Community Question Answering Arian Askari, Suzan Verberne, and Gabriella Pasi. Expert Finding in Legal Community Question Answ

Arian Askari 3 Oct 31, 2022
An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering PC-SOS-SDP is an exact algorithm based on the branch-and-bound techn

Antonio M. Sudoso 1 Nov 13, 2022
Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

SNN_Calibration Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021 Feature Comparison of SNN calibration: Features SNN Direct Tr

Yuhang Li 60 Dec 27, 2022
List of content farm sites like g.penzai.com.

内容农场网站清单 Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。 尤为过分的是,该类网站可能有成千上万个分身域名被 Goog

WDMPA 541 Jan 03, 2023
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Predicitng_viability Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for

Gopalika Sharma 1 Nov 08, 2021
「PyTorch Implementation of AnimeGANv2」を用いて、生成した顔画像を元の画像に上書きするデモ

AnimeGANv2-Face-Overlay-Demo PyTorch Implementation of AnimeGANv2を用いて、生成した顔画像を元の画像に上書きするデモです。

KazuhitoTakahashi 21 Oct 18, 2022
DISTIL: Deep dIverSified inTeractIve Learning.

DISTIL: Deep dIverSified inTeractIve Learning. An active/inter-active learning library built on py-torch for reducing labeling costs.

decile-team 110 Dec 06, 2022
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity by Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elen

VITA 18 Dec 31, 2022
A python module for configuration of block devices

Blivet is a python module for system storage configuration. CI status Licence See COPYING Installation From Fedora repositories Blivet is available in

78 Dec 14, 2022
Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions"

Graph Convolution Simulator (GCS) Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions" Requirements: PyTor

yifan 10 Oct 18, 2022
This python-based package offers a way of creating a parametric OpenMC plasma source from plasma parameters.

openmc-plasma-source This python-based package offers a way of creating a parametric OpenMC plasma source from plasma parameters. The OpenMC sources a

Fusion Energy 10 Oct 18, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
An open source app to help calm you down when needed.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Sean P. Myrick V19.1.7.2 2 Oct 24, 2022
This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

RUAS This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision" A prelimin

Vision & Optimization Group (VOG) 2 May 05, 2022
ICRA 2021 - Robust Place Recognition using an Imaging Lidar

Robust Place Recognition using an Imaging Lidar A place recognition package using high-resolution imaging lidar. For best performance, a lidar equippe

Tixiao Shan 293 Dec 27, 2022
[ECE NTUA] 👁 Computer Vision - Lab Projects & Theoretical Problem Sets (2020-2021)

Computer Vision - NTUA (2020-2021) This repository hosts the lab projects and theoretical problem sets of the Computer Vision course held by ECE NTUA

Dimitris Dimos 6 Jul 21, 2022