NeuralDiff: Segmenting 3D objects that move in egocentric videos

Overview

NeuralDiff: Segmenting 3D objects that move in egocentric videos

Project Page | Paper + Supplementary | Video

teaser

About

This repository contains the official implementation of the paper NeuralDiff: Segmenting 3D objects that move in egocentric videos by Vadim Tschernezki, Diane Larlus and Andrea Vedaldi. Published at 3DV21.

Given a raw video sequence taken from a freely-moving camera, we study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground containing the objects that move in the video sequence. This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion due to the camera large viewpoint change. In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them. We achieve this factorization by reconstructing the video via a triple-stream neural rendering network that explains the different motions based on corresponding inductive biases. We demonstrate that our method can successfully separate the different types of motion, outperforming recent neural rendering baselines at this task, and can accurately segment moving objects. We do so by assessing the method empirically on challenging videos from the EPIC-KITCHENS dataset which we augment with appropriate annotations to create a new benchmark for the task of dynamic object segmentation on unconstrained video sequences, for complex 3D environments.

Installation

We provide an environment config file for anaconda. You can install and activate it with the following commands:

conda env create -f environment.yaml
conda activate neuraldiff

Dataset

The EPIC-Diff dataset can be downloaded here.

After downloading, move the compressed dataset to the directory of the cloned repository (e.g. NeuralDiff). Then, apply following commands:

mkdir data
mv EPIC-Diff.tar.gz data
cd data
tar -xzvf EPIC-Diff.tar.gz

The RGB frames are hosted separately as a subset from the EPIC-Kitchens dataset. The data are available at the University of Bristol data repository, data.bris. Once downloaded, move the folders into the same directory as mentioned before (data/EPIC-Diff).

Pretrained models

We are providing model checkpoints for all 10 scenes. You can use these to

  • evaluate the models with the annotations from the EPIC-Diff benchmark
  • create a summary video like at the top of this README to visualise the separation of the video into background, foreground and actor

The models can be downloaded here (about 50MB in total).

Once downloaded, place ckpts.tar.gz into the main directory. Then execute tar -xzvf ckpts.tar.gz. This will create a folder ckpts with the pretrained models.

Reproducing results

Visualisations and metrics per scene

To evaluate the scene with Video ID P01_01, use the following command:

sh scripts/eval.sh rel P01_01 rel 'masks' 0 0

The results are saved in results/rel. The subfolders contain a txt file containing the mAP and PSNR scores per scene and visualisations per sample.

You can find all scene IDs in the EPIC-Diff data folder (e.g. P01_01, P03_04, ... P21_01).

Average metrics over all scenes

You can calculate the average of the metrics over all scenes (Table 1 in the paper) with the following command:

sh scripts/eval.sh rel 0 0 'average' 0 0

Make sure that you have calculated the metrics per scene before proceeding with that (this command simply reads the produced metrics per scene and averages them).

Rendering a video with separation of background, foreground and actor

To visualise the different model components of a reconstructed video (as seen on top of this page) from

  1. the ground truth camera poses corresponding to the time of the video
  2. and a fixed viewpoint, use the following command:
sh scripts/eval.sh rel P01_01 rel 'summary' 0 0

This will result in a corresponding video in the folder results/rel/P01_01/summary.

The fixed viewpoints are pre-defined and correspond to the ones that we used in the videos provided in the supplementary material. You can adjust the viewpoints in __init__.py of dataset.

Training

We provide scripts for the proposed model (including colour normalisation). To train a model for scene P01_01, use the following command.

sh scripts/train.sh P01_01

You can visualise the training with tensorboard. The logs are stored in logs.

Citation

If you found our code or paper useful, then please cite our work as follows.

@inproceedings{tschernezki21neuraldiff,
  author     = {Vadim Tschernezki and Diane Larlus and
                Andrea Vedaldi},
  booktitle  = {Proceedings of the International Conference
                on {3D} Vision (3DV)},
  title      = {{NeuralDiff}: Segmenting {3D} objects that
                move in egocentric videos},
  year       = {2021}
}

Acknowledgements

This implementation is based on this (official NeRF) and this repository (unofficial NeRF-W).

Our dataset is based on a sub-set of frames from EPIC-Kitchens. COLMAP was used for computing 3D information for these frames and VGG Image Annotator (VIA) was used for annotating them.

Owner
Vadim Tschernezki
Vadim Tschernezki
NeurIPS 2021 Datasets and Benchmarks Track

AP-10K: A Benchmark for Animal Pose Estimation in the Wild Introduction | Updates | Overview | Download | Training Code | Key Questions | License Intr

AP-10K 82 Dec 11, 2022
Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"

ASAP-Net This project implements ASAP-Net of paper ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation (BMVC2020). Overview We i

Hanwen Cao 26 Aug 25, 2022
Pytorch implementation of SimSiam Architecture

SimSiam-pytorch A simple pytorch implementation of Exploring Simple Siamese Representation Learning which is developed by Facebook AI Research (FAIR)

Saeed Shurrab 1 Oct 20, 2021
✨✨✨An awesome open source toolbox for stereo matching.

OpenStereo This is an awesome open source toolbox for stereo matching. Supported Methods: BM SGM(T-PAMI'07) GCNet(ICCV'17) PSMNet(CVPR'18) StereoNet(E

Wang Qingyu 6 Nov 04, 2022
Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs)

Why Spectral Normalization Stabilizes GANs: Analysis and Improvements [paper (NeurIPS 2021)] [paper (arXiv)] [code] Authors: Zinan Lin, Vyas Sekar, Gi

Zinan Lin 32 Dec 16, 2022
Block Sparse movement pruning

Movement Pruning: Adaptive Sparsity by Fine-Tuning Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; ho

Hugging Face 54 Dec 20, 2022
Using Python to Play Cyberpunk 2077

CyberPython 2077 Using Python to Play Cyberpunk 2077 This repo will contain code from the Cyberpython 2077 video series on Youtube (youtube.

Harrison 118 Oct 18, 2022
TensorLight - A high-level framework for TensorFlow

TensorLight is a high-level framework for TensorFlow-based machine intelligence applications. It reduces boilerplate code and enables advanced feature

Benjamin Kan 10 Jul 31, 2022
Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

8 Oct 06, 2022
Open Source Light Field Toolbox for Super-Resolution

BasicLFSR BasicLFSR is an open-source and easy-to-use Light Field (LF) image Super-Ressolution (SR) toolbox based on PyTorch, including a collection o

Squidward 50 Nov 18, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

170 Jan 03, 2023
Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight) Abstract Due to the limited and even imbalanced dat

Hanzhe Hu 99 Dec 12, 2022
VGG16 model-based classification project about brain tumor detection.

Brain-Tumor-Classification-with-MRI VGG16 model-based classification project about brain tumor detection. First, you can check what people are doing o

Atakan Erdoğan 2 Mar 21, 2022
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

haifeng xia 32 Oct 26, 2022
PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

Pytorch EO Deep Learning for Earth Observation applications and research. 🚧 This project is in early development, so bugs and breaking changes are ex

earthpulse 28 Aug 25, 2022
Portfolio analytics for quants, written in Python

QuantStats: Portfolio analytics for quants QuantStats Python library that performs portfolio profiling, allowing quants and portfolio managers to unde

Ran Aroussi 2.7k Jan 08, 2023
This is a TensorFlow implementation for C2-Rec

This is a TensorFlow implementation for C2-Rec We refer to the repo SASRec. Requirements requirement.txt Datasets This repo includes Amazon Beauty dat

7 Nov 14, 2022
Differentiable Abundance Matching With Python

shamnet Differentiable Stellar Population Synthesis Installation You can install shamnet with pip. Installation dependencies are numpy, jax, corrfunc,

5 Dec 17, 2021
Fashion Entity Classification

Fashion-Entity-Classification - Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grays

ADITYA SHAH 1 Jan 04, 2022
Object recognition using Azure Custom Vision AI and Azure Functions

Step by Step on how to create an object recognition model using Custom Vision, export the model and run the model in an Azure Function

El Bruno 11 Jul 08, 2022