Layered Neural Atlases for Consistent Video Editing

Overview

Layered Neural Atlases for Consistent Video Editing

Project Page | Paper

This repository contains an implementation for the SIGGRAPH Asia 2021 paper Layered Neural Atlases for Consistent Video Editing.

The paper introduces the first approach for neural video unwrapping using an end-to-end optimized interpretable and semantic atlas-based representation, which facilitates easy and intuitive editing in the atlas domain.

Installation Requirements

The code is compatible with Python 3.7 and PyTorch 1.6.

You can create an anaconda environment called neural_atlases with the required dependencies by running:

conda create --name neural_atlases python=3.7 
conda activate neural_atlases 
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy  scikit-image tqdm  opencv -c pytorch
pip install imageio-ffmpeg gdown
python -m pip install detectron2 -f   https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html

Data convention

The code expects 3 folders for each video input, e.g. for a video of 50 frames named "blackswan":

  1. data/blackswan: A folder of video frames containing image files in the following convention: blackswan/00000.jpg,blackswan/00001.jpg,...,blackswan/00049.jpg (as in the DAVIS dataset).
  2. data/blackswan_flow: A folder with forward and backward optical flow files in the following convention: blackswan_flow/00000.jpg_00001.jpg.npy,blackswan_flow/00001.jpg_00000.jpg,...,blackswan_flow/00049.jpg_00048.jpg.npy.
  3. data/blackswan_maskrcnn: A folder with rough masks (created by Mask-RCNN or any other way) containing files in the following convention: blackswan_maskrcnn/00000.jpg,blackswan_maskrcnn/00001.jpg,...,blackswan_maskrcnn/00049.jpg

For a few examples of DAVIS sequences run:

gdown https://drive.google.com/uc?id=1WipZR9LaANTNJh764ukznXXAANJ5TChe
unzip data.zip

Masks extraction

Given only the video frames folder data/blackswan it is possible to extract the Mask-RCNN masks (and create the required folder data/blackswan_maskrcnn) by running:

python preprocess_mask_rcnn.py --vid-path data/blackswan --class_name bird

where --class_name determines the COCO class name of the sought foreground object. It is also possible to choose the first instance retrieved by Mask-RCNN by using --class_name anything. This is usefull for cases where Mask-RCNN gets correct masks with wrong classes as in the "libby" video:

python preprocess_mask_rcnn.py --vid-path data/libby --class_name anything

Optical flows extraction

Furthermore, the optical flow folder can be extracted using RAFT. For linking RAFT into the current project run:

git submodule update --init
cd thirdparty/RAFT/
./download_models.sh
cd ../..

For extracting the optical flows (and creating the required folder data/blackswan_flow) run:

python preprocess_optical_flow.py --vid-path data/blackswan --max_long_edge 768

Pretrained models

For downloading a sample set of our pretrained models together with sample edits run:

gdown https://drive.google.com/uc?id=10voSCdMGM5HTIYfT0bPW029W9y6Xij4D
unzip pretrained_models.zip

Training

For training a model on a video, run:

python train.py config/config.json

where the video frames folder is determined by the config parameter "data_folder". Note that in order to reduce the training time it is possible to reduce the evaluation frequency controlled by the parameter "evaluate_every" (e.g. by changing it to 10000). The other configurable parameters are documented inside the file train.py.

Evaluation

During training, the model is evaluated. For running only evaluation on a trained folder run:

python only_evaluate.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=evaluation_outputs

where trained_model_folder is the path to a folder that contains the config.json and checkpoint files of the trained model.

Editing

To apply editing, run the script only_edit.py. Examples for the supplied pretrained models for "blackswan" and "boat":

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_foreground.png --edit_background_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_background.png
python only_edit.py --trained_model_folder=pretrained_models/checkpoints/boat --video_name=boat --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/boat/edit_boat_foreground.png --edit_background_path=pretrained_models/edit_inputs/boat/edit_boat_backgound.png

Where edit_foreground_path and edit_background_path specify the paths to 1000x1000 images of the RGBA atlas edits.

For applying an edit that was done on a frame (e.g. for the pretrained "libby"):

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/libby --video_name=libby --data_folder=data --output_folder=editing_outputs  --use_edit_frame --edit_frame_index=7 --edit_frame_path=pretrained_models/edit_inputs/libby/edit_frame_.png

Citation

If you find our work useful in your research, please consider citing:

@article{kasten2021layered,
  title={Layered Neural Atlases for Consistent Video Editing},
  author={Kasten, Yoni and Ofri, Dolev and Wang, Oliver and Dekel, Tali},
  journal={arXiv preprint arXiv:2109.11418},
  year={2021}
}
Owner
Yoni Kasten
Yoni Kasten
An implementation of chunked, compressed, N-dimensional arrays for Python.

Zarr Latest Release Package Status License Build Status Coverage Downloads Gitter Citation What is it? Zarr is a Python package providing an implement

Zarr Developers 1.1k Dec 30, 2022
This repository attempts to replicate the SqueezeNet architecture and implement the same on an image classification task.

SqueezeNet-Implementation This repository attempts to replicate the SqueezeNet architecture using TensorFlow discussed in the research paper: "Squeeze

Rohan Mathur 3 Dec 13, 2022
Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Is it Time to Replace CNNs with Transformers for Medical Images? Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (C

Christos Matsoukas 80 Dec 27, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
A privacy-focused, intelligent security camera system.

Self-Hosted Home Security Camera System A privacy-focused, intelligent security camera system. Features: Multi-camera support w/ minimal configuration

Scott Barnes 175 Jan 01, 2023
PyTorch implementation of CloudWalk's recent work DenseBody

densebody_pytorch PyTorch implementation of CloudWalk's recent paper DenseBody. Note: For most recent updates, please check out the dev branch. Update

Lingbo Yang 401 Nov 19, 2022
Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

Cognitive Tools Lab 38 Jan 06, 2023
Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains This is an accompanying repository to the ICAIL 2021 pap

4 Dec 16, 2021
Really awesome semantic segmentation

really-awesome-semantic-segmentation A list of all papers on Semantic Segmentation and the datasets they use. This site is maintained by Holger Caesar

Holger Caesar 400 Nov 28, 2022
LBK 26 Dec 28, 2022
Parameterized Explainer for Graph Neural Network

PGExplainer This is a Tensorflow implementation of the paper: Parameterized Explainer for Graph Neural Network https://arxiv.org/abs/2011.04573 NeurIP

Dongsheng Luo 89 Dec 12, 2022
机器学习、深度学习、自然语言处理等人工智能基础知识总结。

说明 机器学习、深度学习、自然语言处理基础知识总结。 目前主要参考李航老师的《统计学习方法》一书,也有一些内容例如XGBoost、聚类、深度学习相关内容、NLP相关内容等是书中未提及的。

Peter 445 Dec 12, 2022
Official implementation of deep-multi-trajectory-based single object tracking (IEEE T-CSVT 2021).

DeepMTA_PyTorch Officical PyTorch Implementation of "Dynamic Attention-guided Multi-TrajectoryAnalysis for Single Object Tracking", Xiao Wang, Zhe Che

Xiao Wang(王逍) 7 Dec 03, 2022
Efficiently computes derivatives of numpy code.

Note: Autograd is still being maintained but is no longer actively developed. The main developers (Dougal Maclaurin, David Duvenaud, Matt Johnson, and

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 6.1k Jan 08, 2023
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
Causal estimators for use with WhyNot

WhyNot Estimators A collection of causal inference estimators implemented in Python and R to pair with the Python causal inference library whynot. For

ZYKLS 8 Apr 06, 2022
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
A Blender python script for getting asset browser custom preview images for objects and collections.

asset_snapshot A Blender python script for getting asset browser custom preview images for objects and collections. Installation: Click the code butto

Johnny Matthews 44 Nov 29, 2022
This is the official github repository of the Met dataset

The Met dataset This is the official github repository of the Met dataset. The official webpage of the dataset can be found here. What is it? This cod

Nikolaos-Antonios Ypsilantis 35 Dec 17, 2022
Face Transformer for Recognition

Face-Transformer This is the code of Face Transformer for Recognition (https://arxiv.org/abs/2103.14803v2). Recently there has been great interests of

Zhong Yaoyao 153 Nov 30, 2022