Hard cater examples from Hopper ICLR paper

Related tags

Deep Learningcater-h
Overview

CATER-h NEC Laboratories America, Inc.

Honglu Zhou*, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf

(*Contact: [email protected])

CATER-h is the dataset proposed for the Video Reasoning task, specifically, the problem of Object Permanence, investigated in Hopper: Multi-hop Transformer for Spatiotemporal Reasoning accepted to ICLR 2021. Please refer to our full paper for detailed analysis and evaluations.

1. Overview

This repository provides the CATER-h dataset used in the paper "Hopper: Multi-hop Transformer for Spatiotemporal Reasoning", as well as instructions/code to create the CATER-h dataset.

If you find the dataset or the code helpful, please cite:

Honglu Zhou, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf. Hopper: Multi-hop Transformer for Spatiotemporal Reasoning. In International Conference on Learning Representations (ICLR), 2021.

@inproceedings{zhou2021caterh,
    title = {{Hopper: Multi-hop Transformer for Spatiotemporal Reasoning}},
    author = {Zhou, Honglu and Kadav, Asim and Lai, Farley and Niculescu-Mizil, Alexandru and Min, Martin Renqiang and Kapadia, Mubbasir and Graf, Hans Peter},
    booktitle = {ICLR},
    year = 2021
}  

2. Dataset

A pre-generated sample of the dataset used in the paper is provided here. If you'd like to generate a version of the dataset, please follow instructions in the following.

3. Requirements

  1. All CLEVR requirements (eg, Blender: the code was used with v2.79b).
  2. This code was used on Linux machines.
  3. GPU: This code was tested with multiple types of GPUs and should be compatible with most GPUs. By default it will use all the GPUs on the machine.
  4. All DETR requirements. You can check the site-packages of our conda environment (Python3.7.6) used.

4. Generating CATER-h

4.1 Generating videos and labels

(We modify code provided by CATER.)

  1. cd generate/

  2. echo $PWD >> blender-2.79b-linux-glibc219-x86_64/2.79/python/lib/python3.5/site-packages/clevr.pth (You can download our blender-2.79b-linux-glibc219-x86_64.)

  3. Run time python launch.py to start generating. Please read through the script to change any settings, paths etc. The command line options should also be easy to follow from the script (e.g., --num_images specifies the number of videos to generate).

  4. time python gen_train_test.py to generate labels for the dataset for each of the tasks. Change the parameters on the top of the file, and run it.

4.2 Obtaining frame and object features

You can find our extracted frame and object features here. The CNN backbone we utilized to obtain the frame features is a pre-trained ResNeXt-101 model. We use DETR trained on the LA-CATER dataset to obtain object features.

4.3 Filtering data by the frame index of the last visible snitch

  1. cd extract/

  2. Download our pretrained object detector from here. Create a folder checkpoints. Put the pretrained object detector into the folder checkpoints.

  3. Change paths etc in extract/configs/CATER-h.yml

  4. time ./run.sh

This will generate an output folder with pickle files that save the frame index of the last visible snitch and the detector's confidence.

  1. Run resample.ipynb which will resample the data to have balanced train/val set in terms of the class label and the frame index of the last visible snitch.

Acknowledgments

The code in this repository is heavily based on the following publically available implementations:

Owner
NECLA ML Group
NEC Labs America, Machine Learning Group
NECLA ML Group
Caffe-like explicit model constructor. C(onfig)Model

cmodel Caffe-like explicit model constructor. C(onfig)Model Installation pip install git+https://github.com/bonlime/cmodel Usage In order to allow usi

1 Feb 18, 2022
Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 08, 2022
Simulation of moving particles under microscopic imaging

Simulation of moving particles under microscopic imaging Install scipy numpy scikit-image tiffile Run python simulation.py Read result https://imagej

Zehao Wang 2 Dec 14, 2021
for a paper about leveraging discourse markers for training new models

TSLM-DISCOURSE-MARKERS Scope This repository contains: (1) Code to extract discourse markers from wikipedia (TSA). (1) Code to extract significant dis

International Business Machines 6 Nov 02, 2022
PyTorch implementation for View-Guided Point Cloud Completion

PyTorch implementation for View-Guided Point Cloud Completion

22 Jan 04, 2023
Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

Facebook Research 1k Jan 08, 2023
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR) This is the official implementation of our paper Personalized Tran

Yongchun Zhu 81 Dec 29, 2022
Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

MosaicOS Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation. Introduction M

Cheng Zhang 27 Oct 12, 2022
Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Triple-cooperative Video Shadow Detection Code and dataset for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"[arXiv link] [official l

Zhihao Chen 24 Oct 04, 2022
Experiments with Fourier layers on simulation data.

Factorized Fourier Neural Operators This repository contains the code to reproduce the results in our NeurIPS 2021 ML4PS workshop paper, Factorized Fo

Alasdair Tran 57 Dec 25, 2022
The code for "Deep Level Set for Box-supervised Instance Segmentation in Aerial Images".

Deep Levelset for Box-supervised Instance Segmentation in Aerial Images Wentong Li, Yijie Chen, Wenyu Liu, Jianke Zhu* This code is based on MMdetecti

sunshine.lwt 112 Jan 05, 2023
Using some basic methods to show linkages and transformations of robotic arms

roboticArmVisualizer Python GUI application to create custom linkages and adjust joint angles. In the future, I plan to add 2d inverse kinematics solv

Sandesh Banskota 1 Nov 19, 2021
automatic color-grading

color-matcher Description color-matcher enables color transfer across images which comes in handy for automatic color-grading of photographs, painting

hahnec 168 Jan 05, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Deep Networks with Recurrent Layer Aggregation

RLA-Net: Recurrent Layer Aggregation Recurrence along Depth: Deep Networks with Recurrent Layer Aggregation This is an implementation of RLA-Net (acce

Joy Fang 21 Aug 16, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI 2022)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
Blender Python - Node-based multi-line text and image flowchart

MindMapper v0.8 Node-based text and image flowchart for Blender Mindmap with shortcuts visible: Mindmap with shortcuts hidden: Notes This was requeste

SpectralVectors 58 Oct 08, 2022
LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

Nan Xiaohu 15 Nov 04, 2022
This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Graphormer By Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng*, Guolin Ke, Di He*, Yanming Shen and Tie-Yan Liu. This repo is the official impl

Microsoft 1.3k Dec 26, 2022
Multi-view 3D reconstruction using neural rendering. Unofficial implementation of UNISURF, VolSDF, NeuS and more.

Volume rendering + 3D implicit surface Showcase What? previous: surface rendering; now: volume rendering previous: NeRF's volume density; now: implici

Jianfei Guo 682 Jan 04, 2023