Hard cater examples from Hopper ICLR paper

Last update: May 11, 2021

Related tags

Overview

CATER-h

Honglu Zhou^*, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf

(^*Contact: [email protected])

CATER-h is the dataset proposed for the Video Reasoning task, specifically, the problem of Object Permanence, investigated in Hopper: Multi-hop Transformer for Spatiotemporal Reasoning accepted to ICLR 2021. Please refer to our full paper for detailed analysis and evaluations.

1. Overview

This repository provides the CATER-h dataset used in the paper "Hopper: Multi-hop Transformer for Spatiotemporal Reasoning", as well as instructions/code to create the CATER-h dataset.

If you find the dataset or the code helpful, please cite:

Honglu Zhou, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf. Hopper: Multi-hop Transformer for Spatiotemporal Reasoning. In International Conference on Learning Representations (ICLR), 2021.

@inproceedings{zhou2021caterh,
    title = {{Hopper: Multi-hop Transformer for Spatiotemporal Reasoning}},
    author = {Zhou, Honglu and Kadav, Asim and Lai, Farley and Niculescu-Mizil, Alexandru and Min, Martin Renqiang and Kapadia, Mubbasir and Graf, Hans Peter},
    booktitle = {ICLR},
    year = 2021
}

2. Dataset

A pre-generated sample of the dataset used in the paper is provided here. If you'd like to generate a version of the dataset, please follow instructions in the following.

3. Requirements

All CLEVR requirements (eg, Blender: the code was used with v2.79b).
This code was used on Linux machines.
GPU: This code was tested with multiple types of GPUs and should be compatible with most GPUs. By default it will use all the GPUs on the machine.
All DETR requirements. You can check the site-packages of our conda environment (Python3.7.6) used.

4. Generating CATER-h

4.1 `Generating videos and labels`

(We modify code provided by CATER.)

cd generate/
echo $PWD >> blender-2.79b-linux-glibc219-x86_64/2.79/python/lib/python3.5/site-packages/clevr.pth (You can download our blender-2.79b-linux-glibc219-x86_64.)
Run time python launch.py to start generating. Please read through the script to change any settings, paths etc. The command line options should also be easy to follow from the script (e.g., --num_images specifies the number of videos to generate).
time python gen_train_test.py to generate labels for the dataset for each of the tasks. Change the parameters on the top of the file, and run it.

4.2 `Obtaining frame and object features`

You can find our extracted frame and object features here. The CNN backbone we utilized to obtain the frame features is a pre-trained ResNeXt-101 model. We use DETR trained on the LA-CATER dataset to obtain object features.

4.3 `Filtering data by the frame index of the last visible snitch`

cd extract/
Download our pretrained object detector from here. Create a folder checkpoints. Put the pretrained object detector into the folder checkpoints.
Change paths etc in extract/configs/CATER-h.yml
time ./run.sh

This will generate an output folder with pickle files that save the frame index of the last visible snitch and the detector's confidence.

Run resample.ipynb which will resample the data to have balanced train/val set in terms of the class label and the frame index of the last visible snitch.

Acknowledgments

The code in this repository is heavily based on the following publically available implementations:

https://github.com/rohitgirdhar/CATER

Hard cater examples from Hopper ICLR paper

Related tags

Overview

CATER-h

1. Overview

2. Dataset

3. Requirements

4. Generating CATER-h

4.1 `Generating videos and labels`

4.2 `Obtaining frame and object features`

4.3 `Filtering data by the frame index of the last visible snitch`

Acknowledgments

Owner

NECLA ML Group

SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Official implementation of "Articulation Aware Canonical Surface Mapping"

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Classic Papers for Beginners and Impact Scope for Authors.

Keras Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation by (Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio)

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

The 2nd Version Of Slothybot

YoloV3 Implemented in Tensorflow 2.0

you can add any codes in any language by creating its respective folder (if already not available).

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

GLIP: Grounded Language-Image Pre-training

implementation for paper "ShelfNet for fast semantic segmentation"

Minecraft Hack Detection With Python

Hard cater examples from Hopper ICLR paper

Related tags

Overview

CATER-h

1. Overview

2. Dataset

3. Requirements

4. Generating CATER-h

4.1 Generating videos and labels

4.2 Obtaining frame and object features

4.3 Filtering data by the frame index of the last visible snitch

Acknowledgments

Owner

NECLA ML Group

SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Official implementation of "Articulation Aware Canonical Surface Mapping"

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Classic Papers for Beginners and Impact Scope for Authors.

Keras Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation by (Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio)

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

The 2nd Version Of Slothybot

YoloV3 Implemented in Tensorflow 2.0

you can add any codes in any language by creating its respective folder (if already not available).

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

GLIP: Grounded Language-Image Pre-training

implementation for paper "ShelfNet for fast semantic segmentation"

Minecraft Hack Detection With Python

4.1 `Generating videos and labels`

4.2 `Obtaining frame and object features`

4.3 `Filtering data by the frame index of the last visible snitch`