Official Repository of NeurIPS2021 paper: PTR

Related tags

Deep LearningPTR
Overview

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

Dataset Overview

Figure 1. Dataset Overview.

Introduction

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.

PTR is accepted by NeurIPS 2021.

Authors: Yining Hong, Li Yi, Joshua B Tenenbaum, Antonio Torralba and Chuang Gan from UCLA, MIT, IBM, Stanford and Tsinghua.

Arxiv Version: https://arxiv.org/abs/2112.05136

Project Page: http://ptr.csail.mit.edu/

Download

Data and evaluation server can be found here

TODOs

baseline models will be available soon!

About the Data

The data includes train/val/test images / questions / scene annotations / depths. Note that due to data cleaning process, the indices of the images are not necessarily consecutive.

The scene annotation is a json file that contains the following keys:

    cam_location        #location of the camera
    cam_rotation        #rotation of the camera
    directions          #Based on the camera, the vectors of the directions
    image_filename      #the filename of the image
    image_index         #the index of the image
    objects             #the objects in the scene, which contains a list of objects
        3d_coords       #the location of the object
        category        #the object category
        line_geo        #a dictionary containing (part, line unit normal vector) pairs. See the [unit normal vector](https://sites.math.washington.edu/~king/coursedir/m445w04/notes/vector/normals-plane.html) of a line. If the vector is not a unit vector, then the part cannot be considered a line.
        plane_geo       #a dictionary containing (part, plane unit normal vector) pairs. See the [unit normal vector](https://sites.math.washington.edu/~king/coursedir/m445w04/notes/vector/normals-plane.html) of a plane. If the vector is not a unit vector, then the part cannot be considered a line.
        obj_mask        #the mask of the object
        part_color      #a dictionary containing the colors of the parts
        part_count      #a dictionary containing the number of the parts
        part_mask       #a dictionary containing the masks of the parts
        partnet_id      #the id of the original partnet object in the PartNet dataset
        pixel_coords    #the pixel of the object
    relationships       #according to the directions, the spatial relationships of the objects
    projection_matrix   #the projection matrix of the camera to reconstruct 3D scene using depths
    physics(optional)   #if physics in the keys and the key is True, this is a physical scene.

The question file is a json file which contains a list of questions. Each question has the following keys:

    image_filename      #the image file that the question asks about
    image_index         #the image index that the question asks about
    program             #the original program used to generate the question
    program_nsclseq     #rearranged program as described in the paper
    question            #the question text
    answer              #the answer text
    type1               #the five questions types
    type2               #the 14 subtypes described in Table 2 in the paper

Data Generation Engine

The images and scene annotations can be generated via invoking data_generation/image_generation/render_images_partnet.py

blender --background --python render_images_partnet.py -- [args]

To generate physical scenes, invoke data_generation/image_generation/render_images_physics.py

blender --background --python render_images_physics.py -- [args]

For more instructions on image generation, please go to this directory and see the README file

To generate questions and answers based on the images, please go to this directory, and run

python generate_questions.py --input_scene_dir $INPUT_SCENE_DIR --output_dir $OUTPUT_QUESTION_DIR --output_questions_file $OUTPUT_FILE

The data generation engine is based partly on the CLEVR generation engine.

Errata

We have manually examined the images, annotations and questions twice. However, provided that there are annotation errors of the PartNet dataset we used, there could still be some errors in the scene annotations. If you find any errors that make the questions unanswerable, please contact [email protected].

Citations

@inproceedings{hong2021ptr,
author = {Hong, Yining and Yi, Li and Tenenbaum, Joshua B and Torralba, Antonio and Gan, Chuang},
title = {PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning},
booktitle = {Advances In Neural Information Processing Systems},
year = {2021}
}
Owner
Yining Hong
https://evelinehong.github.io
Yining Hong
Good Classification Measures and How to Find Them

Good Classification Measures and How to Find Them This repository contains supplementary materials for the paper "Good Classification Measures and How

Yandex Research 7 Nov 13, 2022
An Unbiased Learning To Rank Algorithms (ULTRA) toolbox

Unbiased Learning to Rank Algorithms (ULTRA) This is an Unbiased Learning To Rank Algorithms (ULTRA) toolbox, which provides a codebase for experiment

back 3 Nov 18, 2022
This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs)

Description This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs) in [Gardy et

Ludovic Gardy 0 Feb 09, 2022
NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

NAC Official PyTorch implementation of NAC from the paper: Neural Auto-Curricula in Two-Player Zero-Sum Games. We release code for: Gradient based ora

Xidong Feng 19 Nov 11, 2022
Alleviating Over-segmentation Errors by Detecting Action Boundaries

Alleviating Over-segmentation Errors by Detecting Action Boundaries Forked from ASRF offical code. This repo is the a implementation of replacing orig

13 Dec 12, 2022
MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks Introduction This repo contains the pytorch impl

Meta Research 38 Oct 10, 2022
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Hugging Face Optimum 🤗 Optimum is an extension of 🤗 Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

Hugging Face 842 Dec 30, 2022
RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids Real-time detection performance. This repo contains the code an

0 Nov 10, 2021
EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

Atsuki Yamaguchi 31 Nov 18, 2022
Style transfer between images was performed using the VGG19 model

Style transfer between images was performed using the VGG19 model. The necessary codes, libraries and all other information of this project are available below

Onur yılmaz 2 May 09, 2022
PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambig

王皓波 147 Jan 07, 2023
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022
Differentiable simulation for system identification and visuomotor control

gradsim gradSim: Differentiable simulation for system identification and visuomotor control gradSim is a unified differentiable rendering and multiphy

105 Dec 18, 2022
Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

T2Net Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021) [Paper][Code] Dependencies numpy==1.18.5 scikit_image==

64 Nov 23, 2022
Instance-conditional Knowledge Distillation for Object Detection

Instance-conditional Knowledge Distillation for Object Detection This is a MegEngine implementation of the paper "Instance-conditional Knowledge Disti

MEGVII Research 47 Nov 17, 2022
Self-Supervised Learning for Domain Adaptation on Point-Clouds

Self-Supervised Learning for Domain Adaptation on Point-Clouds Introduction Self-supervised learning (SSL) allows to learn useful representations from

Idan Achituve 66 Dec 20, 2022
Cancer metastasis detection with neural conditional random field (NCRF)

NCRF Prerequisites Data Whole slide images Annotations Patch images Model Training Testing Tissue mask Probability map Tumor localization FROC evaluat

Baidu Research 731 Jan 01, 2023
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

Rex Cheng 456 Dec 12, 2022
ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプル

ByteTrack-ONNX-Sample ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプルです。 ONNXに変換したモデルも同梱しています。 変換自体を試したい方はByteT

KazuhitoTakahashi 16 Oct 26, 2022
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

Changlin Li 127 Dec 26, 2022