Data and code for ICCV 2021 paper Distant Supervision for Scene Graph Generation.

Related tags

Deep LearningVisualDS
Overview

Distant Supervision for Scene Graph Generation

Data and code for ICCV 2021 paper Distant Supervision for Scene Graph Generation.

Introduction

The paper applies distant supervision to visual relation detection. The intuition of distant supervision is that possible predicates between entity pairs are highly dependent on the entity types. For example, there might be ride on, feed between human and horse in images, but it is less likely to be covering. Thus, we apply this correlation to take advantage of unlabeled data. Given the knowledge base containing possible combinations between entity types and predicates, our framework enables distantly supervised training without using any human-annotated relation data, and semi-supervised training that incorporates both human-labeled data and distantly labeled data. To build the knowledge base, we parse all possible (subject, predicate, object) triplets from Conceptual Caption dataset, resulting in a knowledge base containing 1.9M distinct relational triples.

Code

Thanks to the elegant code from Scene-Graph-Benchmark.pytorch. This project is built on their framework. There are also some differences from their settings. We show the differences in a later section.

The Illustration of Distant Supervision

alt text

Installation

Check INSTALL.md for installation instructions.

Dataset

Check DATASET.md for instructions of dataset preprocessing.

Metrics

Our metrics are directly adapted from Scene-Graph-Benchmark.pytorch.

Object Detector

Download Pre-trained Detector

In generally SGG tasks, the detector is pre-trained on the object bounding box annotations on training set. We directly use the pre-trained Faster R-CNN provided by Scene-Graph-Benchmark.pytorch, because our 20 category setting and their 50 category setting have the same training set.

After you download the Faster R-CNN model, please extract all the files to the directory /home/username/checkpoints/pretrained_faster_rcnn. To train your own Faster R-CNN model, please follow the next section.

The above pre-trained Faster R-CNN model achives 38.52/26.35/28.14 mAp on VG train/val/test set respectively.

Pre-train Your Own Detector

In this work, we do not modify the Faster R-CNN part. The training process can be referred to the origin code.

EM Algorithm based Training

All commands of training are saved in the directory cmds/. The directory of cmds looks like:

cmds/  
├── 20 
│   └── motif
│       ├── predcls
│       │   ├── ds \\ distant supervision which is weakly supervised training
│       │   │   ├── em_M_step1.sh
│       │   │   ├── em_E_step2.sh
│       │   │   ├── em_M_step2.sh
│       │   │   ├── em_M_step1_wclip.sh
│       │   │   ├── em_E_step2_wclip.sh
│       │   │   └── em_M_step2_wclip.sh
│       │   ├── semi \\ semi-supervised training 
│       │   │   ├── em_E_step1.sh
│       │   │   ├── em_M_step1.sh
│       │   │   ├── em_E_step2.sh
│       │   │   └── em_M_step2.sh
│       │   └── sup
│       │       ├── train.sh
│       │       └── val.sh
│       │
│       ├── sgcls
│       │   ...
│       │
│       ├── sgdet
│       │   ...

Generally, we use an EM algorithm based training, which means the model is trained iteratively. In E-step, we estimate the predicate label distribution between entity pairs. In M-step, we optimize the model with estimated predicate label distribution. For example, the em_E_step1 means the initialization of predicate label distribution, and in em_M_step1 the model will be optimized on the label estimation.

All checkpoints can be downloaded from MODEL_ZOO.md.

Preparation

Before running the code, you need to specify the current path as environment variable SG and the experiments' root directory as EXP.

# specify current directory as SG, e.g.:
export SG=~/VisualDS
# specify experiment directory, e.g.:
export EXP=~/exps

Weakly Supervised Training

Weakly supervised training can be done with only knowledge base or can also use external semantic signals to train a better model. As for the external semantic signals, we use currently popular CLIP to initialize the probability of possible predicates between entity pairs.

  1. w/o CLIP training for Predcls:
# no need for em_E_step1
sh cmds/20/motif/predcls/ds/em_M_step1.sh
sh cmds/20/motif/predcls/ds/em_E_step2.sh
sh cmds/20/motif/predcls/ds/em_M_step2.sh
  1. with CLIP training for Predcls:

Before training, please ensure datasets/vg/20/cc_clip_logits.pk is downloaded.

# the em_E_step1 is conducted by CLIP
sh cmds/20/motif/predcls/ds/em_M_step1_wclip.sh
sh cmds/20/motif/predcls/ds/em_E_step2_wclip.sh
sh cmds/20/motif/predcls/ds/em_M_step2_wclip.sh
  1. training for Sgcls and Sgdet:

E_step results of Predcls are directly used for Sgcls and Sgdet. Thus, there is no em_E_step.sh for Sgcls and Sgdet.

Semi-Supervised Training

In semi-supervised training, we use supervised model trained with labeled data to estimate predicate labels for entity pairs. So before conduct semi-supervised training, we should conduct a normal supervised training on Predcls task first:

sh cmds/20/motif/predcls/sup/train.sh

Or just download the trained model here, and put it into $EXP/20/predcls/sup/sup.

Noted that, for three tasks Predcls, Sgcls, Sgdet, we all use supervised model of Predcls task to initialize predicate label distributions. After the preparation, we can run:

sh cmds/20/motif/predcls/semi/em_E_step1.sh
sh cmds/20/motif/predcls/semi/em_M_step1.sh
sh cmds/20/motif/predcls/semi/em_E_step2.sh
sh cmds/20/motif/predcls/semi/em_M_step2.sh

Difference from Scene-Graph-Benchmark.pytorch

  1. Fix a bug in evaluation.

    We found that in previous evaluation, there are sometimes duplicated triplets in images, e.g. (1-man, ride, 2-horse)*3. We fix this small bug and use only unique triplets. By fixing the bug, the performance of the model will decrease somewhat. For example, the [email protected] of predcls task will decrease about 1~3 points.

  2. We conduct experiments on 20 categories predicate setting rather than 50 categories.

  3. In evaluation, weakly supervised trained model uses logits rather than softmax normalized scores for relation triplets ranking.

Owner
THUNLP
Natural Language Processing Lab at Tsinghua University
THUNLP
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

Tianhong Dai 6 Jul 18, 2022
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Semi Hand-Object Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time (CVPR 2021).

96 Dec 27, 2022
Parallel Latent Tree-Induction for Faster Sequence Encoding

FastTrees This repository contains the experimental code supporting the FastTrees paper by Bill Pung. Software Requirements Python 3.6, NLTK and PyTor

Bill Pung 4 Mar 29, 2022
Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

A Simple Neural Network from scratch A Simple Neural Network from scratch in Pyt

Youssef Chafiqui 2 Jan 07, 2022
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

AST: Audio Spectrogram Transformer Introduction Citing Getting Started ESC-50 Recipe Speechcommands Recipe AudioSet Recipe Pretrained Models Contact I

Yuan Gong 603 Jan 07, 2023
Deep-Learning-Book-Chapter-Summaries - Attempting to make the Deep Learning Book easier to understand.

Deep-Learning-Book-Chapter-Summaries This repository provides a summary for each chapter of the Deep Learning book by Ian Goodfellow, Yoshua Bengio an

Aman Dalmia 1k Dec 27, 2022
Alfred-Restore-Iterm-Arrangement - An Alfred workflow to restore iTerm2 window Arrangements

Alfred-Restore-Iterm-Arrangement This alfred workflow will list avaliable iTerm2

7 May 10, 2022
JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

jumpdiff jumpdiff is a python library with non-parametric Nadaraya─Watson estimators to extract the parameters of jump-diffusion processes. With jumpd

Rydin 28 Dec 10, 2022
Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

Shiqi Yang 53 Dec 25, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020).

NHDRRNet-PyTorch This is the PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020). 0. Differences between Original Paper and

Yutong Zhang 1 Mar 01, 2022
StrongSORT: Make DeepSORT Great Again

StrongSORT StrongSORT: Make DeepSORT Great Again StrongSORT: Make DeepSORT Great Again Yunhao Du, Yang Song, Bo Yang, Yanyun Zhao arxiv 2202.13514 Abs

369 Jan 04, 2023
Weakly-supervised semantic image segmentation with CNNs using point supervision

Code for our ECCV paper What's the Point: Semantic Segmentation with Point Supervision. Summary This library is a custom build of Caffe for semantic i

27 Sep 14, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
This repo contains research materials released by members of the Google Brain team in Tokyo.

Brain Tokyo Workshop 🧠 🗼 This repo contains research materials released by members of the Google Brain team in Tokyo. Past Projects Weight Agnostic

Google 1.2k Jan 02, 2023
Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

DimitrisAlivas 19 Jul 26, 2022
FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

LST-TTS Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audi

Li-Wei Chen 64 Dec 30, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Official implementation of the ICML2021 paper "Elastic Graph Neural Networks"

ElasticGNN This repository includes the official implementation of ElasticGNN in the paper "Elastic Graph Neural Networks" [ICML 2021]. Xiaorui Liu, W

liuxiaorui 34 Dec 04, 2022
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

168 Dec 24, 2022