image scene graph generation benchmark

Overview

Scene Graph Benchmark in PyTorch 1.7

This project is based on maskrcnn-benchmark

alt text

Highlights

  • Upgrad to pytorch 1.7
  • Multi-GPU training and inference
  • Batched inference: can perform inference using multiple images per batch per GPU.
  • Fast and flexible tsv dataset format
  • Remove FasterRCNN detector dependency: during relation head training, can plugin bounding boxes from any detector.
  • Provides pre-trained models for different scene graph detection algorithms (IMP, MSDN, GRCNN, Neural Motif, RelDN).
  • Provides bounding box level and relation level feature extraction functionalities
  • Provides large detector backbones (ResNxt152)

Installation

Check INSTALL.md for installation instructions.

Model Zoo and Baselines

Pre-trained models can be found in SCENE_GRAPH_MODEL_ZOO.md

Visualization and Demo

We provide a helper class to simplify writing inference pipelines using pre-trained models (Currently only support objects and attributes). Here is how we would do it. Run the following commands:

# visualize VinVL object detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.obj.jpg MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize VinVL object-attribute detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.attr.jpg --visualize_attr MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize OpenImage scene graph generation by RelDN
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/sgg_model_zoo/sgg_oi_vrd_model_zoo/RX152FPN_reldn_oi_best.pth
python tools/demo/demo_image.py --config_file sgg_configs/vrd/R152FPN_vrd_reldn.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file output/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.reldn_relation.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False

# visualize Visual Genome scene graph generation by neural motif
python tools/demo/demo_image.py --config_file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa_vgnm.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False DATASETS.LABELMAP_FILE "visualgenome/VG-SGG-dicts-danfeiX-clipped.json" DATA_DIR /home/penzhan/GitHub/maskrcnn-benchmark-1/datasets1 MODEL.ROI_RELATION_HEAD.USE_BIAS True MODEL.ROI_RELATION_HEAD.FILTER_NON_OVERLAP True MODEL.ROI_HEADS.DETECTIONS_PER_IMG 64 MODEL.ROI_RELATION_HEAD.SHARE_BOX_FEATURE_EXTRACTOR False MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.OBJ_LSTM_NUM_LAYERS 0 MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.EDGE_LSTM_NUM_LAYERS 2 TEST.IMS_PER_BATCH 2

Perform training

For the following examples to work, you need to first install this repo.

You will also need to download the dataset. Datasets can be downloaded by azcopy with following command:

path/to/azcopy copy 'https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/datasets/TASK_NAME' <target folder> --recursive

TASK_NAME could be visualgenome, openimages_v5c.

We recommend to symlink the path to the dataset to datasets/ as follows

# symlink the dataset
cd ~/github/maskrcnn-benchmark
mkdir -p datasets/openimages_v5c/
ln -s /vrd datasets/openimages_v5c/vrd

You can also prepare your own datasets.

Follow tsv dataset creation instructions tools/mini_tsv/README.md

Single GPU training

python tools/train_sg_net.py --config-file "/path/to/config/file.yaml"

This should work out of the box and is very similar to what we should do for multi-GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 4x larger, which might lead to out-of-memory errors.

Multi-GPU training

We use internally torch.distributed.launch in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU.

export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_sg_net.py --config-file "path/to/config/file.yaml" 

Evaluation

You can test your model directly on single or multiple gpus. To evaluate relations, one needs to output "relation_scores_all" in the TSV_SAVE_SUBSET. Here are a few example command line for evaluating on 4 GPUS:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH 

# vg IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_imp.yaml

# vg MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_msdn.yaml

# vg neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml

# vg GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_grcnn.yaml

# vg RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_reldn.yaml

# oi IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_imp_bias_oi.yaml

# oi MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_msdn_bias_oi.yaml

# oi neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_motif_oi.yaml

# oi GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_grcnn_oi.yaml

# oi RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vrd/R152FPN_vrd_reldn.yaml

To evaluate in sgcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_BOX_HEAD.FORCE_BOXES True MODEL.ROI_RELATION_HEAD.MODE "sgcls"

To evaluate in predcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.MODE "predcls"

To evaluate with ground truth bbox and ground truth pairs:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.FORCE_RELATIONS True

Abstractions

For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md.

Adding your own dataset

This implementation adds support for TSV style datasets. But adding support for training on a new dataset can be done as follows:

from maskrcnn_benchmark.data.datasets.relation_tsv import RelationTSVDataset

class MyDataset(RelationTSVDataset):
    def __init__(self, yaml_file, extra_fields=(), transforms=None,
            is_load_label=True, **kwargs):

        super(MyDataset, self).__init__(yaml_file, extra_fields, transforms, is_load_label, **kwargs)
    
    def your_own_function(self, idx, call=False):
        # you can overwrite function or add your own functions this way
        pass

That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask), or even your own instance type.

For a full example of how the VGTSVDataset is implemented, check maskrcnn_benchmark/data/datasets/vg_tsv.py.

Once you have created your dataset, it needs to be added in a couple of places:

Adding your own evaluation

To enable your dataset for testing, add a corresponding if statement in maskrcnn_benchmark/data/datasets/evaluation/__init__.py:

if isinstance(dataset, datasets.MyDataset):
        return your_evaluation(**args)

VinVL Feature extraction

The output feature will be encoded as base64

# extract vision features with VinVL object-attribute detection model
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "../maskrcnn-benchmark-1/datasets1" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True

To extract relation features (union bounding box's feature), in yaml file, set TEST.OUTPUT_RELATION_FEATURE to True, add relation_feature in TEST.TSV_SAVE_SUBSET.

To extract bounding box features, in yaml file, set TEST.OUTPUT_FEATURE to True, add feature in TEST.TSV_SAVE_SUBSET.

Troubleshooting

If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md. If your issue is not present there, please feel free to open a new issue.

Citations

Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package.

@misc{han2021image,
      title={Image Scene Graph Generation (SGG) Benchmark}, 
      author={Xiaotian Han and Jianwei Yang and Houdong Hu and Lei Zhang and Jianfeng Gao and Pengchuan Zhang},
      year={2021},
      eprint={2107.12604},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

maskrcnn-benchmark is released under the MIT license. See LICENSE for additional details.

Acknowledgement

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

WebDataset WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and us

1.1k Jan 08, 2023
Conversational text Analysis using various NLP techniques

PyConverse Let me try first Installation pip install pyconverse Usage Please try this notebook that demos the core functionalities: basic usage noteb

Rita Anjana 158 Dec 25, 2022
Single Image Random Dot Stereogram for Tensorflow

TensorFlow-SIRDS Single Image Random Dot Stereogram for Tensorflow SIRDS is a means to present 3D data in a 2D image. It allows for scientific data di

Greg Peatfield 5 Aug 10, 2022
Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation By Qiang Zhou*, Zilong Huang*, Lichao Huang, Han Shen, Yon

Forest 117 Apr 01, 2022
Hack Camera, Microphone, Location, Clipboard With Just a Link. Also, Get Many Details About Victim's Device. And So On...

An Automated Tool to Hack Victim's Camera, Microphone, Location, Clipboard. Has 2 Extra Features. Version 1.1 Update Fixed Some Major Bugs Data Saving

ToxicNoob 36 Jan 07, 2023
Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look

Rishabh Jangir 20 Nov 24, 2022
Tutorial on scikit-learn and IPython for parallel machine learning

Parallel Machine Learning with scikit-learn and IPython Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearra

Olivier Grisel 1.6k Dec 26, 2022
Goal of the project : Detecting Temporal Boundaries in Sign Language videos

MVA RecVis course final project : Goal of the project : Detecting Temporal Boundaries in Sign Language videos. Sign language automatic indexing is an

Loubna Ben Allal 6 Dec 21, 2022
Spectralformer: Rethinking hyperspectral image classification with transformers

Spectralformer: Rethinking hyperspectral image classification with transformers Danfeng Hong, Zhu Han, Jing Yao, Lianru Gao, Bing Zhang, Antonio Plaza

Danfeng Hong 102 Dec 29, 2022
This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

🗣️ aspeak A simple text-to-speech client using azure TTS API(trial). 😆 TL;DR: This program uses trial auth token of Azure Cognitive Services to do s

Levi Zim 359 Jan 05, 2023
Architecture Patterns with Python (TDD, DDD, EDM)

architecture-traning Architecture Patterns with Python (TDD, DDD, EDM) Chapter 5. 높은 기어비와 낮은 기어비의 TDD 5.2 도메인 계층 테스트를 서비스 계층으로 옮겨야 하는가? 도메인 계층 테스트 def

minsung sim 2 Mar 04, 2022
Awesome Remote Sensing Toolkit based on PaddlePaddle.

基于飞桨框架开发的高性能遥感图像处理开发套件,端到端地完成从训练到部署的全流程遥感深度学习应用。 最新动态 PaddleRS 即将发布alpha版本!欢迎大家试用 简介 PaddleRS是遥感科研院所、相关高校共同基于飞桨开发的遥感处理平台,支持遥感图像分类,目标检测,图像分割,以及变化检测等常用遥

146 Dec 11, 2022
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 04, 2023
Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight) Abstract Due to the limited and even imbalanced dat

Hanzhe Hu 99 Dec 12, 2022
Tilted Empirical Risk Minimization (ICLR '21)

Tilted Empirical Risk Minimization This repository contains the implementation for the paper Tilted Empirical Risk Minimization ICLR 2021 Empirical ri

Tian Li 40 Nov 28, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
face_recognization (FaceNet) + TFHE (HNP) + hand_face_detection (Mediapipe)

SuperControlSystem Face_Recognization (FaceNet) 面部识别 (FaceNet) Fully Homomorphic Encryption over the Torus (HNP) 环面全同态加密 (TFHE) Hand_Face_Detection (M

liziyu0104 2 Dec 30, 2021
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
Convert Apple NeuralHash model for CSAM Detection to ONNX.

Apple NeuralHash is a perceptual hashing method for images based on neural networks. It can tolerate image resize and compression.

Asuhariet Ygvar 1.5k Dec 31, 2022
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022