A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Related tags

Deep LearningWSSGG
Overview

README.md shall be finished soon.

WSSGG

0 Overview

Our model uses the image's paired caption as weak supervision to learn the entities in the image and the relations among them. At inference time, it generates scene graphs without help from texts. To learn our model, we first allow context information to propagate on the text graph to enrich the entity word embeddings (Sec. 3.1). We found this enrichment provides better localization of the visual objects. Then, we optimize a text-query-guided attention model (Sec. 3.2) to provide the image-level entity prediction and associate the text entities with visual regions best describing them. We use the joint probability to choose boxes associated with both subject and object (Sec. 3.3), then use the top scoring boxes to learn better grounding (Sec. 3.4). Finally, we use an RNN (Sec. 3.5) to capture the vision-language common-sense and refine our predictions.

1 Installation

git clone "https://github.com/yekeren/WSSGG.git" && cd "WSSGG"

We use Tensorflow 1.5 and Python 3.6.4. To continue, please ensure that at least the correct Python version is installed. requirements.txt defines the list of python packages we installed. Simply run pip install -r requirements.txt to install these packages after setting up python. Next, run protoc protos/*.proto --python_out=. to compile the required protobuf protocol files, which are used for storing configurations.

pip install -r requirements.txt
protoc protos/*.proto --python_out=.

1.1 Faster-RCNN

Our Faster-RCNN implementation relies on the Tensorflow object detection API. Users can use git clone "https://github.com/tensorflow/models.git" "tensorflow_models" && ln -s "tensorflow_models/research/object_detection" to set up. Also, don't forget to using protoc to compire the protos used by the detection API.

The specific Faster-RCNN model we use is faster_rcnn_inception_resnet_v2_atrous_lowproposals_oidv2 to keep it the same as the VSPNet. More information is in Tensorflow object detection zoo.

git clone "https://github.com/tensorflow/models.git" "tensorflow_models" 
ln -s "tensorflow_models/research/object_detection"
cd tensorflow_models/research/; protoc object_detection/protos/*.proto --python_out=.; cd -

mkdir -p "zoo"
wget -P "zoo" "http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz"
tar xzvf zoo/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz -C "zoo"

1.2 Language Parser

Though we indicate the dependency on spacy in requirements.txt, we still need to run python -m spacy download en for English. Then, we checkout the tool at SceneGraphParser by running git clone "https://github.com/vacancy/SceneGraphParser.git" && ln -s "SceneGraphParser/sng_parser"

python -m spacy download en
git clone "https://github.com/vacancy/SceneGraphParser.git"
ln -s "SceneGraphParser/sng_parser"

1.3 GloVe Embeddings

We use the pre-trained 300-D GloVe embeddings.

wget -P "zoo" "http://nlp.stanford.edu/data/glove.6B.zip"
unzip "zoo/glove.6B.zip" -d "zoo"

python "dataset-tools/export_glove_words_and_embeddings.py" \
  --glove_file "zoo/glove.6B.300d.txt" \
  --output_vocabulary_file "zoo/glove_word_tokens.txt" \
  --output_vocabulary_word_embedding_file "zoo/glove_word_vectors.npy"

2 Settings

To avoid the time-consuming Faster RCNN processes in 2.1 and 2.2, users can directly download the features we provided at the following URLs. Then, the scripts create_vg_settings.sh and create_coco_setting.sh will check the existense of the Faster-RCNN features and skip the processs if they are provided. Please note that in the following table, we assume the directory for holding the VG and COCO data to be vg-gt-cap and coco-cap.

Name URLs Please extract to directory
VG Faster-RCNN features https://storage.googleapis.com/weakly-supervised-scene-graphs-generation/vg_frcnn_proposals.zip vg-gt-cap/frcnn_proposals/
COCO Faster-RCNN features https://storage.googleapis.com/weakly-supervised-scene-graphs-generation/coco_frcnn_proposals.zip coco-cap/frcnn_proposals/

2.1 VG-GT-Graph and VG-Cap-Graph

Typing sh dataset-tools/create_vg_settings.sh "vg-gt-cap" will generate VG-related files under the folder "vg-gt-cap" (for both VG-GT-Graph and VG-Cap-Graph settings). Basically, it will download the datasets and launch the following programs under the dataset-tools directory.

Name Desc.
create_vg_frcnn_proposals.py Extract VG visual proposals using Faster-RCNN
create_vg_text_graphs.py Extract VG text graphs using Language Parser
create_vg_vocabulary Gather the VG vocabulary
create_vg_gt_graph_tf_record.py Generate TF record files for the VG-GT-Graph setting
create_vg_cap_graph_tf_record.py Generate TF record files for the VG-Cap-Graph setting

2.2 COCO-Cap-Graph

Typing sh dataset-tools/create_coco_settings.sh "coco-cap" "vg-gt-cap" will generate COCO-related files under the folder "coco-cap" (for COCO-Cap-Graph setting). Basically, it will download the datasets and launch the following programs under the dataset-tools directory. Please note that the "vg-gt-cap" directory should be created in that we need to get the split information (either Zareian et al. or Xu et al.).

Name Desc.
create_coco_frcnn_proposals.py Extract COCO visual proposals using Faster-RCNN
create_coco_text_graphs.py Extract COCO text graphs using Language Parser
create_coco_vocabulary Gather the COCO vocabulary
create_coco_cap_graph_tf_record.py Generate TF record files for the COCO-Cap-Graph setting

3 Training and Evaluation

Multi-GPUs (5 GPUs in our case) training cost less than 2.5 hours to train a single model, while single-GPU strategy requires more than 8 hours.

3.1 Multi-GPUs training

We use TF distributed training to train the models shown in our paper. For example, the following command shall create and train a model specified by the proto config file configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt, and save the trained model to a directory named "logs/base_phr_ite_seq". In train.sh, we create 1 ps, 1, chief, 3 workers, and 1 evaluator. The 6 instances are distributed on 5 GPUS (4 for training and 1 for evaluation).

sh train.sh \
  "configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt" \
  "logs/base_phr_ite_seq"

3.2 Single-GPU training

Our model can also be trained using single GPU strategy such as follow. However, we would suggest to half the learning rate or explore for better other hyper-parameters.

python "modeling/trainer_main.py" \
  --pipeline_proto "configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt" \
  --model_dir ""logs/base_phr_ite_seq""

3.3 Performance on test set

During the training process, there is an evaluator measuring the model's performance on the validation set and save the best model checkpoint. Finally, we use the following command to evaluate the saved model's performance on the test set. This evaluation process will last for 2-3 hours depends on the post-process parameters (e.g., see here). Currently, there are many kinds of stuff written in pure python, which we would later optimize to utilize GPU better to reduce the final evaluation time.

python "modeling/trainer_main.py" \
  --pipeline_proto "configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt" \
  --model_dir ""logs/base_phr_ite_seq"" \
  --job test

3.4 Primary configs and implementations

Take configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt as an example, the following configs control the model's behavior.

Name Desc. Impl.
linguistic_options Specify the phrasal context modeling, remove the section to disable it. models/cap2sg_linguistic.py
grounding_options Specify the grounding options. models/cap2sg_grounding.py
detection_options Specify the WSOD model, num_iterations to control the iterative process. models/cap2sg_detection.py
relation_options Specify the relation detection modeling. models/cap2sg_relation.py
common_sense_options Specify the sequential context modeling, remove the section to disable it. models/cap2sg_common_sense.py

4 Visualization

Please see cap2sg.ipynb.

5 Reference

If you find this project helps, please cite our CVPR2021 paper :)

@InProceedings{Ye_2021_CVPR,
  author = {Ye, Keren and Kovashka, Adriana},
  title = {Linguistic Structures as Weak Supervision for Visual Scene Graph Generation},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2021}
}

Also, please take a look at our old work in ICCV2019.

@InProceedings{Ye_2019_ICCV,
  author = {Ye, Keren and Zhang, Mingda and Kovashka, Adriana and Li, Wei and Qin, Danfeng and Berent, Jesse},
  title = {Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month = {October},
  year = {2019}
}
Owner
Keren Ye
Ph.D. student at the University of Pittsburgh. I am interested in both Computer Vision and Natural Language Processing.
Keren Ye
Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

🔍 Watermarking Images in Self-Supervised Latent-Spaces PyTorch implementation and pretrained models for the paper. For details, see Watermarking Imag

Meta Research 32 Dec 13, 2022
Hybrid Neural Fusion for Full-frame Video Stabilization

FuSta: Hybrid Neural Fusion for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 430 Jan 04, 2023
EMNLP 2021 paper The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

Codebase for training transformers on systematic generalization datasets. The official repository for our EMNLP 2021 paper The Devil is in the Detail:

Csordás Róbert 57 Nov 21, 2022
JFB: Jacobian-Free Backpropagation for Implicit Models

JFB: Jacobian-Free Backpropagation for Implicit Models

Typal Research 28 Dec 11, 2022
Dense matching library based on PyTorch

Dense Matching A general dense matching library based on PyTorch. For any questions, issues or recommendations, please contact Prune at

Prune Truong 399 Dec 28, 2022
Tools for robust generative diffeomorphic slice to volume reconstruction

RGDSVR Tools for Robust Generative Diffeomorphic Slice to Volume Reconstructions (RGDSVR) This repository provides tools to implement the methods in t

Lucilio Cordero-Grande 0 Oct 29, 2021
Good Semi-Supervised Learning That Requires a Bad GAN

Good Semi-Supervised Learning that Requires a Bad GAN This is the code we used in our paper Good Semi-supervised Learning that Requires a Bad GAN Ziha

Zhilin Yang 177 Dec 12, 2022
OpenVINO黑客松比赛项目

Window_Guard OpenVINO黑客松比赛项目 英文名称:Window_Guard 中文名称:窗口卫士 硬件 树莓派4B 8G版本 一个磁石开关 USB摄像头(MP4视频文件也可以) 软件(库) OpenVINO RPi 使用方法 本项目使用的OPenVINO是是2021.3版本,并使用了

Tango 6 Jul 04, 2021
AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models Description

Angel de Paula 0 Jun 08, 2022
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

MOSES 656 Dec 29, 2022
Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Git repositoty of the manuscript entitled Statistical quantification of confounding bias in predictive modelling by Tamas Spisak The manuscript descri

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany 0 Nov 22, 2021
Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On

Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On [Project website] [Dataset] [Video] Abstract We propose a new g

71 Dec 24, 2022
A Deep Learning based project for creating line art portraits.

ArtLine The main aim of the project is to create amazing line art portraits. Sounds Intresting,let's get to the pictures!! Model-(Smooth) Model-(Quali

Vijish Madhavan 3.3k Jan 07, 2023
Caffe implementation for Hu et al. Segmentation for Natural Language Expressions

Segmentation from Natural Language Expressions This repository contains the Caffe reimplementation of the following paper: R. Hu, M. Rohrbach, T. Darr

10 Jul 27, 2021
Model serving at scale

Run inference at scale Cortex is an open source platform for large-scale machine learning inference workloads. Workloads Realtime APIs - respond to pr

Cortex Labs 7.9k Jan 06, 2023
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks This repository contains the code and data for the corresp

Friederike Metz 7 Apr 23, 2022
Send text to girlfriend in the morning

Girlfriend Text Send text to girlfriend (or really anyone with a phone number) in the morning 1. Configure your settings in utils.py. phone_number = "

Paras Adhikary 199 Oct 25, 2022
This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

Cross-Descriptor Visual Localization and Mapping This repository contains the implementation of the following paper: "Cross-Descriptor Visual Localiza

Mihai Dusmanu 81 Oct 06, 2022
Python implementation of "Elliptic Fourier Features of a Closed Contour"

PyEFD An Python/NumPy implementation of a method for approximating a contour with a Fourier series, as described in [1]. Installation pip install pyef

Henrik Blidh 71 Dec 09, 2022