Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Overview

Physion: Evaluating Physical Prediction from Vision in Humans and Machines

Animation of the 8 scenarios

This repo contains code and data to reproduce the results in our paper, Physion: Evaluating Physical Prediction from Vision in Humans and Machines. Please see below for details about how to download the Physion dataset, replicate our modeling & human experiments, and statistical analyses to reproduce our results.

  1. Downloading the Physion dataset
  2. Dataset generation
  3. Modeling experiments
  4. Human experiments
  5. Comparing models and humans

Downloading the Physion dataset

Downloading the Physion test set (a.k.a. stimuli)

PhysionTest-Core (270 MB)

PhysionTest-Core is all you need to evaluate humans and models on exactly the same test stimuli used in our paper.

It contains eight directories, one for each scenario type (e.g., collide, contain, dominoes, drape, drop, link, roll, support).

Each of these directories contains three subdirectories:

  • maps: Contains PNG segmentation maps for each test stimulus, indicating location of agent object in red and patient object in yellow.
  • mp4s: Contains the MP4 video files presented to human participants. The agent and patient objects appear in random colors.
  • mp4s-redyellow: Contains the MP4 video files passed into models. The agent and patient objects consistently appear in red and yellow, respectively.

Download URL: https://physics-benchmarking-neurips2021-dataset.s3.amazonaws.com/Physion.zip.

PhysionTest-Complete (380 GB)

PhysionTest-Complete is what you want if you need more detailed metadata for each test stimulus.

Each stimulus is encoded in an HDF5 file containing comprehensive information regarding depth, surface normals, optical flow, and segmentation maps associated with each frame of each trial, as well as other information about the physical states of objects at each time step.

Download URL: https://physics-benchmarking-neurips2021-dataset.s3.amazonaws.com/PhysionTestHDF5.tar.gz.

You can also download the testing data for individual scenarios from the table in the next section.

Downloading the Physion training set

Downloading PhysionTrain-Dynamics

PhysionTrain-Dynamics contains the full dataset used to train the dynamics module of models benchmarked in our paper. It consists of approximately 2K stimuli per scenario type.

Download URL (770 MB): https://physics-benchmarking-neurips2021-dataset.s3.amazonaws.com/PhysionTrainMP4s.tar.gz

Downloading PhysionTrain-Readout

PhysionTrain-Readout contains a separate dataset used for training the object-contact prediction (OCP) module for models pretrained on the PhysionTrain-Dynamics dataset. It consists of 1K stimuli per scenario type.

The agent and patient objects in each of these readout stimuli consistently appear in red and yellow, respectively (as in the mp4s-redyellow examples from PhysionTest-Core above).

NB: Code for using these readout sets to benchmark any pretrained model (not just models trained on the Physion training sets) will be released prior to publication.

Download URLs for complete PhysionTrain-Dynamics and PhysionTrain-Readout:

Scenario Dynamics Training Set Readout Training Set Test Set
Dominoes Dominoes_dynamics_training_HDF5s Dominoes_readout_training_HDF5s Dominoes_testing_HDF5s
Support Support_dynamics_training_HDF5s Support_readout_training_HDF5s Support_testing_HDF5s
Collide Collide_dynamics_training_HDF5s Collide_readout_training_HDF5s Collide_testing_HDF5s
Contain Contain_dynamics_training_HDF5s Contain_readout_training_HDF5s Contain_testing_HDF5s
Drop Drop_dynamics_training_HDF5s Drop_readout_training_HDF5s Drop_testing_HDF5s
Roll Roll_dynamics_training_HDF5s Roll_readout_training_HDF5s Roll_testing_HDF5s
Link Link_dynamics_training_HDF5s Link_readout_training_HDF5s Link_testing_HDF5s
Drape Drape_dynamics_training_HDF5s Drape_readout_training_HDF5s Drape_testing_HDF5s

Dataset generation

This repo depends on outputs from tdw_physics.

Specifically, tdw_physics is used to generate the dataset of physical scenarios (a.k.a. stimuli), including both the training datasets used to train physical-prediction models, as well as test datasets used to measure prediction accuracy in both physical-prediction models and human participants.

Instructions for using the ThreeDWorld simulator to regenerate datasets used in our work can be found here. Links for downloading the Physion testing, training, and readout fitting datasets can be found here.

Modeling experiments

The modeling component of this repo depends on the physopt repo. The physopt repo implements an interface through which a wide variety of physics prediction models from the literature (be they neural networks or otherwise) can be adapted to accept the inputs provided by our training and testing datasets and produce outputs for comparison with our human measurements.

The physopt also contains code for model training and evaluation. Specifically, physopt implements three train/test procols:

  • The only protocol, in which each candidate physics model architecture is trained -- using that model's native loss function as specified by the model's authors -- separately on each of the scenarios listed above (e.g. "dominoes", "support", &c). This produces eight separately-trained models per candidate architecture (one for each scenario). Each of these separate models are then tested in comparison to humans on the testing data for that scenario.
  • A all protocol, in which each candidate physics architecture is trained on mixed data from all of the scenarios simultaneously (again, using that model's native loss function). This single model is then tested and compared to humans separately on each scenario.
  • A all-but-one protocol, in which each candidate physics architecture is trained on mixed data drawn for all but one scenario -- separately for all possible choices of the held-out scenario. This produces eight separately-trained models per candidate architecture (one for each held-out scenario). Each of these separate models are then tested in comparison to humans on the testing data for that scenario.

Results from each of the three protocols are separately compared to humans (as described below in the section on comparison of humans to models). All model-human comparisons are carried using a representation-learning paradigm, in which models are trained on their native loss functions (as encoded by the original authors of the model). Trained models are then evaluated on the specific physion red-object-contacts-yellow-zone prediction task. This evaluation is carried by further training a "readout", implemented as a linear logistic regression. Readouts are always trained in a per-scenario fashion.

Currently, physopt implements the following specific physics prediction models:

Model Name Our Code Link Original Paper Description
SVG Denton and Fergus 2018 Image-like latent
OP3 Veerapaneni et. al. 2020
CSWM Kipf et. al. 2020
RPIN Qi et. al. 2021
pVGG-mlp
pVGG-lstm
pDEIT-mlp Touvron et. al. 2020
pDEIT-lstm
GNS Sanchez-Gonzalez et. al. 2020
GNS-R
DPI Li et. al. 2019

Human experiments

This repo contains code to conduct the human behavioral experiments reported in this paper, as well as analyze the resulting data from both human and modeling experiments.

The details of the experimental design and analysis plan are documented in our study preregistration contained within this repository. The format for this preregistration is adapted from the templates provided by the Open Science Framework for our studies, and put under the same type of version control as the rest of the codebase for this project.

Here is what each main directory in this repo contains:

  • experiments: This directory contains code to run the online human behavioral experiments reported in this paper. More detailed documentation of this code can be found in the README file nested within the experiments subdirectory.
  • analysis (aka notebooks): This directory contains our analysis jupyter/Rmd notebooks. This repo assumes you have also imported model evaluation results from physopt.
  • results: This directory contains "intermediate" results of modeling/human experiments. It contains three subdirectories: csv, plots, and summary.
    • /results/csv/ contains csv files containing tidy dataframes with "raw" data.
    • /results/plots/ contains .pdf/.png plots, a selection of which are then polished and formatted for inclusion in the paper using Adobe Illustrator.
    • Important: Before pushing any csv files containing human behavioral data to a public code repository, triple check that this data is properly anonymized. This means no bare AMT Worker ID's or Prolific participant IDs.
  • stimuli: This directory contains any download/preprocessing scripts for data (a.k.a. stimuli) that are the inputs to human behavioral experiments. This repo assumes you have generated stimuli using tdw_physics. This repo uses code in this directory to upload stimuli to AWS S3 and generate metadata to control the timeline of stimulus presentation in the human behavioral experiments.
  • utils: This directory is meant to contain any files containing general helper functions.

Comparing models and humans

The results reported in this paper can be reproduced by running the Jupyter notebooks contained in the analysis directory.

  1. Downloading results. To download the "raw" human and model prediction behavior, please navigate to the analysis directory and execute the following command at the command line: python download_results.py. This script will fetch several CSV files and download them to subdirectories within results/csv. If this does not work, please download this zipped folder (csv) and move it to the results directory: https://physics-benchmarking-neurips2021-dataset.s3.amazonaws.com/model_human_results.zip.
  2. Reproducing analyses. To reproduce the key analyses reported in the paper, please run the following notebooks in this sequence:
    • summarize_human_model_behavior.ipynb: The purpose of this notebook is to:
      • Apply preprocessing to human behavioral data
      • Visualize distribution and compute summary statistics over human physical judgments
      • Visualize distribution and compute summary statistics over model physical judgments
      • Conduct human-model comparisons
      • Output summary CSVs that can be used for further statistical modeling & create publication-quality visualizations
    • inference_human_model_behavior.ipynb: The purpose of this notebook is to:
      • Visualize human and model prediction accuracy (proportion correct)
      • Visualize average-human and model agreement (RMSE)
      • Visualize human-human and model-human agreement (Cohen's kappa)
      • Compare performance between models
    • paper_plots.ipynb: The purpose of this notebook is to create publication-quality figures for inclusion in the paper.
Owner
Cognitive Tools Lab
reverse engineering the human cognitive toolkit
Cognitive Tools Lab
True per-item rarity for Loot

True-Rarity True per-item rarity for Loot (For Adventurers) and More Loot A.K.A mLoot each out/true_rarity_{item_type}.json file contains probabilitie

Dan R. 3 Jul 26, 2022
Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

Final-Project Final project in the Technion, Biomedical faculty, by Mor Ventura, Dekel Brav & Omri Magen. Subproject 1: Automatic Detection of LUS Cha

Mor Ventura 1 Dec 18, 2021
This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Underwater Light Field Retention : Neural Rendering for Underwater Imaging (UWNR) (Accepted by CVPR Workshop2022 NTIRE) Authors: Tian Ye†, Sixiang Che

jmucsx 17 Dec 14, 2022
A check for whether the dependency jobs are all green.

alls-green A check for whether the dependency jobs are all green. Why? Do you have more than one job in your GitHub Actions CI/CD workflows setup? Do

Re:actors 33 Jan 03, 2023
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021 The code for training mCOLT/mRASP2, a multilingua

104 Jan 01, 2023
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

168 Dec 24, 2022
Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction, ICCV-2021".

HF2-VAD Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Predictio

76 Dec 21, 2022
This is a Image aid classification software based on python TK library development

This is a Image aid classification software based on python TK library development.

EasonChan 1 Jan 17, 2022
The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks This folder contains the code to reproduce the data in "The Implicit Bias o

Samuel Lippl 0 Feb 05, 2022
A MNIST-like fashion product database. Benchmark

Fashion-MNIST Table of Contents Why we made Fashion-MNIST Get the Data Usage Benchmark Visualization Contributing Contact Citing Fashion-MNIST License

Zalando Research 10.5k Jan 08, 2023
The code for SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network.

SAG-DTA The code is the implementation for the paper 'SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network'. Requirements py

Shugang Zhang 7 Aug 02, 2022
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 302 Dec 14, 2022
Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

ASAPP Research 67 Dec 01, 2022
Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT : Cross-Attention Multi-Scale Vision Transformer for Image Classification This is an unofficial PyTorch implementation of CrossViT: Cross-Att

Rishikesh (ऋषिकेश) 103 Nov 25, 2022
Annotate with anyone, anywhere.

h h is the web app that serves most of the https://hypothes.is/ website, including the web annotations API at https://hypothes.is/api/. The Hypothesis

Hypothesis 2.6k Jan 08, 2023
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark We propose a benchmark to evaluate different quantization algorithms on vari

494 Dec 29, 2022
Unifying Global-Local Representations in Salient Object Detection with Transformer

GLSTR (Global-Local Saliency Transformer) This is the official implementation of paper "Unifying Global-Local Representations in Salient Object Detect

11 Aug 24, 2022
Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

FPT_data_centric_competition - Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Pham Viet Hoang (Harry) 2 Oct 30, 2022
Adds timm pretrained backbone to pytorch's FasterRcnn model

Operating Systems Lab (ETCS-352) Experiments for Operating Systems Lab (ETCS-352) performed by me in 2021 at uni. All codes are written by me except t

Mriganka Nath 12 Dec 03, 2022
A Transformer-Based Siamese Network for Change Detection

ChangeFormer: A Transformer-Based Siamese Network for Change Detection (Under review at IGARSS-2022) Wele Gedara Chaminda Bandara, Vishal M. Patel Her

Wele Gedara Chaminda Bandara 214 Dec 29, 2022