[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

Related tags

Deep LearningTRACE
Overview

Target Adaptive Context Aggregation for Video Scene Graph Generation

This is a PyTorch implementation for Target Adaptive Context Aggregation for Video Scene Graph Generation.

Requirements

  • PyTorch >= 1.2 (Mine 1.7.1 (CUDA 10.1))
  • torchvision >= 0.4 (Mine 0.8.2 (CUDA 10.1))
  • cython
  • matplotlib
  • numpy
  • scipy
  • opencv
  • pyyaml
  • packaging
  • pycocotools
  • tensorboardX
  • tqdm
  • pillow
  • scikit-image
  • h5py
  • yacs
  • ninja
  • overrides
  • mmcv

Compilation

Compile the CUDA code in the Detectron submodule and in the repo:

# ROOT=path/to/cloned/repository
cd $ROOT/Detectron_pytorch/lib
sh make.sh
cd $ROOT/lib
sh make.sh

Data Preparation

Download Datasets

Download links: VidVRD and AG.

Create directories for datasets. The directories for ./data/ should look like:

|-- data
|   |-- ag
|   |-- vidvrd
|   |-- obj_embed

where ag and vidvrd are for AG and VidVRD datasets, and obj_embed is for GloVe, the weights of pre-trained word vectors. The final directories for GloVe should look like:

|-- obj_embed
|   |-- glove.6B.200d.pt
|   |-- glove.6B.300d.pt
|   |-- glove.6B.300d.txt
|   |-- glove.6B.200d.txt
|   |-- glove.6B.100d.txt
|   |-- glove.6B.50d.txt
|   |-- glove.6B.300d

AG

Put the .mp4 files into ./data/ag/videos/. Put the annotations into ./data/ag/annotations/.

The final directories for VidVRD dataset should look like:

|-- ag
|   |-- annotations
|   |   |-- object_classes.txt
|   |   |-- ...
|   |-- videos
|   |   |-- ....mp4
|   |-- Charades_annotations

VidVRD

Put the .mp4 files into ./data/vidvrd/videos/. Put the three documents test, train and videos from the vidvrd-annoataions into ./data/vidvrd/annotations/.

Download precomputed precomputed features, model and detected relations from here (or here). Extract features and models into ./data/vidvrd/.

The final directories for VidVRD dataset should look like:

|-- vidvrd
|   |-- annotations
|   |   |-- test
|   |   |-- train
|   |   |-- videos
|   |   |-- predicate.txt
|   |   |-- object.txt
|   |   |-- ...
|   |-- features
|   |   |-- relation
|   |   |-- traj_cls
|   |   |-- traj_cls_gt
|   |-- models
|   |   |-- baseline_setting.json
|   |   |-- ...
|   |-- videos
|   |   |-- ILSVRC2015_train_00005003.mp4
|   |   |-- ...

Change the format of annotations for AG and VidVRD

# ROOT=path/to/cloned/repository
cd $ROOT

python tools/rename_ag.py

python tools/rename_vidvrd_anno.py

python tools/get_vidvrd_pretrained_rois.py --out_rpath pre_processed_boxes_gt_dense_more --rpath traj_cls_gt

python tools/get_vidvrd_pretrained_rois.py --out_rpath pre_processed_boxes_dense_more

Dump frames

Our ffmpeg version is 4.2.2-0york0~16.04 so using --ignore_editlist to avoid some frames being ignored. The jpg format saves the drive space.

Dump the annotated frames for AG and VidVRD.

python tools/dump_frames.py --ignore_editlist

python tools/dump_frames.py --ignore_editlist --video_dir data/vidvrd/videos --frame_dir data/vidvrd/frames --frame_list_file val_fname_list.json,train_fname_list.json --annotation_dir data/vidvrd/annotations --st_id 0

Dump the sampled high quality frames for AG and VidVRD.

python tools/dump_frames.py --frame_dir data/ag/sampled_frames --ignore_editlist --frames_store_type jpg --high_quality --sampled_frames

python tools/dump_frames.py --ignore_editlist --video_dir data/vidvrd/videos --frame_dir data/vidvrd/sampled_frames --frame_list_file val_fname_list.json,train_fname_list.json --annotation_dir data/vidvrd/annotations --frames_store_type jpg --high_quality --sampled_frames --st_id 0

If you want to dump all frames with jpg format.

python tools/dump_frames.py --all_frames --frame_dir data/ag/all_frames --ignore_editlist --frames_store_type jpg

Get classes in json format for AG

# ROOT=path/to/cloned/repository
cd $ROOT
python txt2json.py

Get Charades train/test split for AG

Download Charades annotations and extract the annotations into ./data/ag/Charades_annotations/. Then run,

# ROOT=path/to/cloned/repository
cd $ROOT
python tools/dataset_split.py

Pretrained Models

Download model weights from here.

  • pretrained object detection
  • TRACE trained on VidVRD in detection_models/vidvrd/trained_rel
  • TRACE trained on AG in detection_models/ag/trained_rel

Performance

VidVrd, gt box

Method mAP [email protected] [email protected]
TRACE 30.6 19.3 24.6

gt_vidvrd

VidVrd, detected box

Method mAP [email protected] [email protected]
TRACE 16.3 9.2 11.2

det_vidvrd

AG, detected box

det_ag

Training Relationship Detection Models

VidVRD

# ROOT=path/to/cloned/repository
cd $ROOT

CUDA_VISIBLE_DEVICES=0 python tools/train_net_step_rel.py --dataset vidvrd --cfg configs/vidvrd/vidvrd_res101xi3d50_all_boxes_sample_train_flip_dc5_2d_new.yaml --nw 8 --use_tfboard --disp_interval 20 --o SGD --lr 0.025

AG

# ROOT=path/to/cloned/repository
cd $ROOT

CUDA_VISIBLE_DEVICES=0 python tools/train_net_step_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --nw 8 --use_tfboard --disp_interval 20 --o SGD --lr 0.01

Evaluating Relationship Detection Models

VidVRD

evaluation for gt boxes

CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 python tools/test_net_rel.py --dataset vidvrd --cfg configs/vidvrd/vidvrd_res101xi3d50_gt_boxes_dc5_2d_new.yaml --load_ckpt Outputs/vidvrd_res101xi3d50_all_boxes_sample_train_flip_dc5_2d_new/Aug01-16-20-06_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step12999.pth --output_dir Outputs/vidvrd_new101 --do_val --multi-gpu-testing

python tools/transform_vidvrd_results.py --input_dir Outputs/vidvrd_new101 --output_dir Outputs/vidvrd_new101 --is_gt_traj

python tools/test_vidvrd.py --prediction Outputs/vidvrd_new101/baseline_relation_prediction.json --groundtruth data/vidvrd/annotations/test_gt.json

evaluation for detected boxes

CUDA_VISIBLE_DEVICES=1 python tools/test_net_rel.py --dataset vidvrd --cfg configs/vidvrd/vidvrd_res101xi3d50_pred_boxes_flip_dc5_2d_new.yaml --load_ckpt Outputs/vidvrd_res101xi3d50_all_boxes_sample_train_flip_dc5_2d_new/Aug01-16-20-06_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step12999.pth --output_dir Outputs/vidvrd_new101_det2 --do_val

python tools/transform_vidvrd_results.py --input_dir Outputs/vidvrd_new101_det2 --output_dir Outputs/vidvrd_new101_det2

python tools/test_vidvrd.py --prediction Outputs/vidvrd_new101_det2/baseline_relation_prediction.json --groundtruth data/vidvrd/annotations/test_gt.json

AG

evaluation for detected boxes, Recalls (SGDet)

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val

#evaluation for detected boxes, mRecalls
python tools/visualize.py  --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --num 60000 --no_do_vis --rel_class_recall

evaluation for detected boxes, mAP_{rel}

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val --eva_map --topk 50

evaluation for gt boxes, Recalls (SGCls)

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val --use_gt_boxes

#evaluation for detected boxes, mRecalls
python tools/visualize.py  --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --num 60000 --no_do_vis --rel_class_recall

evaluation for gt boxes, gt object labels, Recalls (PredCls)

CUDA_VISIBLE_DEVICES=4 python tools/test_net_rel.py --dataset ag --cfg configs/ag/res101xi3d50_dc5_2d.yaml --load_ckpt Outputs/res101xi3d50_dc5_2d/Nov01-21-50-49_gpuserver-11_step_with_prd_cls_v3/ckpt/model_step177329.pth --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --do_val --use_gt_boxes --use_gt_labels

#evaluation for detected boxes, mRecalls
python tools/visualize.py  --output_dir Outputs/ag_val_101_ag_dc5_jin_map_new_infer_multiatten --num 60000 --no_do_vis --rel_class_recall

Hint

  • We apply the dilation convolution in I3D now, but observe a gridding effect in temporal feature maps.

Acknowledgements

This project is built on top of ContrastiveLosses4VRD, ActionGenome and VidVRD-helper. The corresponding papers are Graphical Contrastive Losses for Scene Graph Parsing, Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs and Video Visual Relation Detection.

Citing

If you use this code in your research, please use the following BibTeX entry.

@inproceedings{Target_Adaptive_Context_Aggregation_for_Video_Scene_Graph_Generation,
  author    = {Yao Teng and
               Limin Wang and
               Zhifeng Li and
               Gangshan Wu},
  title     = {Target Adaptive Context Aggregation for Video Scene Graph Generation},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages     = {13688--13697},
  year      = {2021}
}
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
A new data augmentation method for extreme lighting conditions.

Random Shadows and Highlights This repo has the source code for the paper: Random Shadows and Highlights: A new data augmentation method for extreme l

Osama Mazhar 35 Nov 26, 2022
PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines. We've created a system in which you can easily select and

Medical Machine Learning Lab - University of Münster 57 Nov 12, 2022
An end-to-end implementation of intent prediction with Metaflow and other cool tools

You Don't Need a Bigger Boat An end-to-end (Metaflow-based) implementation of an intent prediction flow for kids who can't MLOps good and wanna learn

Jacopo Tagliabue 614 Dec 31, 2022
Code for the paper Task Agnostic Morphology Evolution.

Task-Agnostic Morphology Optimization This repository contains code for the paper Task-Agnostic Morphology Evolution by Donald (Joey) Hejna, Pieter Ab

Joey Hejna 18 Aug 04, 2022
Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore

[AI6122] Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore. The repository corresponds to the AI6122 of Semester 1, AY2021-2022, starting from 08/2021. The instructor of this course

HT. Li 5 Sep 12, 2022
Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters"

Manga Character Screentone Synthesis Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters" presented in IEEE ISM 2

Tsubota 2 Nov 20, 2021
Tutorial to set up TensorFlow Object Detection API on the Raspberry Pi

A tutorial showing how to set up TensorFlow's Object Detection API on the Raspberry Pi

Evan 1.1k Dec 26, 2022
A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

Phil Wang 515 Dec 26, 2022
Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework via Self-Supervised Multi-Task Learning. Code will be available soon.

Official-PyTorch-Implementation-of-TransMEF Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fu

117 Dec 27, 2022
A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

George Gunter 4 Nov 14, 2022
[CVPR2021] Invertible Image Signal Processing

Invertible Image Signal Processing This repository includes official codes for "Invertible Image Signal Processing (CVPR2021)". Figure: Our framework

Yazhou XING 281 Dec 31, 2022
The best solution of the Weather Prediction track in the Yandex Shifts challenge

yandex-shifts-weather The repository contains information about my solution for the Weather Prediction track in the Yandex Shifts challenge https://re

Ivan Yu. Bondarenko 15 Dec 18, 2022
CCCL: Contrastive Cascade Graph Learning.

CCGL: Contrastive Cascade Graph Learning This repo provides a reference implementation of Contrastive Cascade Graph Learning (CCGL) framework as descr

Xovee Xu 19 Dec 05, 2022
A style-based Quantum Generative Adversarial Network

Style-qGAN A style based Quantum Generative Adversarial Network (style-qGAN) model for Monte Carlo event generation. Tutorial We have prepared a noteb

9 Nov 24, 2022
PromptDet: Expand Your Detector Vocabulary with Uncurated Images

PromptDet: Expand Your Detector Vocabulary with Uncurated Images Paper Website Introduction The goal of this work is to establish a scalable pipeline

103 Dec 20, 2022
image scene graph generation benchmark

Scene Graph Benchmark in PyTorch 1.7 This project is based on maskrcnn-benchmark Highlights Upgrad to pytorch 1.7 Multi-GPU training and inference Bat

Microsoft 303 Dec 27, 2022
The Video-based Accident Detection System built in Python

Accident-detection-system About the Project This Repository contains the Video-based Accident Detection System built in Python. Contributors Yukta Gop

SURYAVANSHI SNEHAL BALKRISHNA 50 Dec 07, 2022
Bolt Online Learning Toolbox

Bolt Online Learning Toolbox Bolt features discriminative learning of linear predictors (e.g. SVM or Logistic Regression) using fast online learning a

Peter Prettenhofer 87 Dec 12, 2022
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 02, 2023
Studying Python release adoptions by looking at PyPI downloads

Analysis of version adoptions on PyPI We get PyPI download statistics via Google's BigQuery using the pypinfo tool. Usage First you need to get an acc

Julien Palard 9 Nov 04, 2022