Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Last update: Jan 01, 2023

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Graph Generation accepted by ICCV2021. We propose a Transformer-based model STTran to generate dynamic scene graphs of the given video. STTran can detect the visual relationships in each frame.

The introduction video is available now: https://youtu.be/gKpnRU8btLg

About the code We run the code on a single RTX2080ti for both training and testing. We borrowed some code from Yang's repository and Zellers' repository.

Usage

We use python=3.6, pytorch=1.1 and torchvision=0.3 in our code. First, clone the repository:

git clone https://github.com/yrcong/STTran.git

We borrow some compiled code for bbox operations.

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

Dataset

We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:

|-- action_genome
    |-- annotations   #gt annotations
    |-- frames        #sampled frames
    |-- videos        #original videos

In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader

Train

You can train the STTran with train.py. We trained the model on a RTX 2080ti:

For PredCLS:

python train.py -mode predcls -datasize large -data_path $DATAPATH

For SGCLS:

python train.py -mode sgcls -datasize large -data_path $DATAPATH

For SGDET:

python train.py -mode sgdet -datasize large -data_path $DATAPATH

Evaluation

You can evaluate the STTran with test.py.

For PredCLS (trained Model):

python test.py -m predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH

For SGCLS (trained Model): :

python test.py -m sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH

For SGDET (trained Model): :

python test.py -m sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH

Citation

If our work is helpful for your research, please cite our publication:

@inproceedings{cong2021spatial,
  title={Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
  author={Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
  url={https://arxiv.org/abs/2107.12309}
}

Help

When you have any question/idea about the code/paper. Please comment in Github or send us Email. We will reply as soon as possible.

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Related tags

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Usage

Dataset

Train

Evaluation

Citation

Help

Owner

Yuren Cong

An open-source, low-cost, image-based weed detection device for fallow scenarios.

A modular PyTorch library for optical flow estimation using neural networks

Bottom-up Human Pose Estimation

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

A font family with a great monospaced variant for programmers.

Evaluating deep transfer learning for whole-brain cognitive decoding

一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.

Code for Understanding Pooling in Graph Neural Networks

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

U-Net implementation in PyTorch for FLAIR abnormality segmentation in brain MRI

Official PyTorch implementation of Data-free Knowledge Distillation for Object Detection, WACV 2021.

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022

Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling".

CCP dataset from Clothing Co-Parsing by Joint Image Segmentation and Labeling

Title: Heart-Failure-Classification

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

New AidForBlind - Various Libraries used like OpenCV and other mentioned in Requirements.txt

Pretrained Cost Model for Distributed Constraint Optimization Problems

Deep Reinforcement Learning for Multiplayer Online Battle Arena