[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Last update: Nov 29, 2022

Overview

Panoptic Segmentation Forecasting

Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021

We propose to study the novel task of ‘panoptic segmentation forecasting’: given a set of observed frames, the goal is to forecast the panoptic segmentation for a set of unobserved frames. We also propose a first approach to forecasting future panoptic segmentations. In contrast to typical semantic forecasting, we model the motion of individual object instances and the background separately. This makes instance information persistent during forecasting, and allows us to understand the motion of each moving object.

⚙️ Setup

Dependencies

Python 3.7
PyTorch 1.5.1
pyyaml
pandas
h5py
opencv
tensorboard
tqdm
pytorch_scatter 2.0.5
cityscapesscripts (for evaluation)
Google Cloud SDK (for downloading data/models)

Install the code using the following command: pip install -e ./

Data

To run this code, the gtFine_trainvaltest dataset will need to be downloaded from the Cityscapes website into the data/ directory.
The remainder of the required data can be downloaded using the script download_data.sh. By default, everything is downloaded into the data/ directory.
Training the background model requires generating a version of the semantic segmentation annotations where foreground regions have been removed. This can be done by running the script scripts/preprocessing/remove_fg_from_gt.sh.
Training the foreground model requires additionally downloading a pretrained MaskRCNN model. This can be found at this link. This should be saved as pretrained_models/fg/mask_rcnn_pretrain.pkl.
Training the background model requires additionally downloading a pretrained HarDNet model. This can be found at this link. This should be saved as pretrained_models/bg/hardnet70_cityscapes_model.pkl.

Running our code

The scripts directory contains scripts which can be used to train and evaluate the foreground, background, and egomotion models. Specifically:

scripts/odom/run_odom_train.sh trains the egomotion prediction model.
scripts/odom/export_odom.sh exports the odometry predictions, which can then be used during evaluation by other models
scripts/bg/run_bg_train.sh trains the background prediction model.
scripts/bg/run_export_bg_val.sh exports predictions make by the background using input reprojected point clouds which come from using predicted egomotion.
scripts/fg/run_fg_train.sh trains the foreground prediction model.
scripts/fg/run_fg_eval_panoptic.sh produces final panoptic semgnetation predictions based on the trained foreground model and exported background predictions. This also uses predicted egomotion as input.

We provide our pretrained foreground, background, and egomotion prediction models. The data downloading script additionally downloads these models into the directory pretrained_models/

✏️ 📄 Citation

If you found our work relevant to yours, please consider citing our paper:

@inproceedings{graber-2021-panopticforecasting,
 title   = {Panoptic Segmentation Forecasting},
 author  = {Colin Graber and
            Grace Tsai and
            Michael Firman and
            Gabriel Brostow and
            Alexander Schwing},
 booktitle = {Computer Vision and Pattern Recognition ({CVPR})},
 year = {2021}
}

[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Related tags

Overview

Panoptic Segmentation Forecasting

⚙️ Setup

Dependencies

Data

Running our code

✏️ 📄 Citation

👩‍⚖️ License

Owner

Niantic Labs

Multiple-Object Tracking with Transformer

The source code of "SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation", accepted to WACV 2022.

Single object tracking and segmentation.

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming

LeetCode Solutions https://t.me/tenvlad

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

PyTorch inference for "Progressive Growing of GANs" with CelebA snapshot

Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

A bare-bones Python library for quality diversity optimization.

This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Official PyTorch implementation of "Preemptive Image Robustification for Protecting Users against Man-in-the-Middle Adversarial Attacks" (AAAI 2022)

PyTorch implementation of neural style randomization for data augmentation

A pyparsing-based library for parsing SOQL statements

Fiddle is a Python-first configuration library particularly well suited to ML applications.

Build and run Docker containers leveraging NVIDIA GPUs

Vehicle detection using machine learning and computer vision techniques for Udacity's Self-Driving Car Engineer Nanodegree.

Disentangled Lifespan Face Synthesis

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~