Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Last update: Dec 30, 2022

Overview

Implicit Internal Video Inpainting

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

paper | project website | 4K data | demo video

Introduction

Want to remove objects from a video without days of training and thousands of training videos? Try our simple but effective internal video inpainting method. The inpainting process is zero-shot and implicit, which does not need any pretraining on large datasets or optical-flow estimation. We further extend the proposed method to more challenging tasks: video object removal with limited annotated masks, and inpainting on ultra high-resolution videos (e.g., 4K videos).

TO DO

Release code for 4K video inpainting

Setup

Installation

git clone https://github.com/Tengfei-Wang/Implicit-Internal-Video-Inpainting.git
cd Implicit-Internal-Video-Inpainting

Environment

This code is based on tensorflow 2.x (tested on tensorflow 2.2, 2.4).

The environment can be simply set up by Anaconda:

conda create -n IIVI python=3.7
conda activate IIVI
conda install tensorflow-gpu tensorboard
pip install pyaml 
pip install opencv-python
pip install tensorflow-addons

Or, you can also set up the environment from the provided environment.yml:

conda env create -f environment.yml
conda activate IIVI

Usage

Quick Start

We provide an example sequence 'bmx-trees' in ./inputs/ . To try our method:

python train.py

The default iterations is set to 50,000 in config/train.yml, and the internal learning takes ~4 hours with a single GPU. During the learning process, you can use tensorboard to check the inpainting results by:

tensorboard --logdir ./exp/logs

After the training, the final results can be saved in ./exp/results/ by:

python test.py

You can also modify 'model_restore' in config/test.yml to save results with different checkpoints.

Try Your Own Data

Data preprocess

Before training, we advise to dilate the object masks first to exclude some edge pixels. Otherwise, the imperfectly-annotated masks would lead to artifacts in the object removal task.

You can generate and preprocess the masks by this script:

python scripts/preprocess_mask.py --annotation_path inputs/annotations/bmx-trees

Basic training

Modify the config/train.yml, which indicates the video path, log path, and training iterations,etc.. The training iterations depends on the video length, and it typically takes 30,000 ~ 80,000 iterations for convergence for 100-frame videos. By default, we only use reconstruction loss for training, and it works well for most cases.

python train.py

Improve the sharpness and consistency

For some hard videos, the former training may not produce a pleasing result. You can fine-tune the trained model with another losses. To this end, modify the 'model_restore' in config/test.yml to the checkpoint path of basic training. Also set ambiguity_loss or stabilization_loss to True. Then fine-tune the basic checkpoint for 20,000-40,000 iterations.

python train.py

Inference

Modify the ./config/test.yml, which indicates the video path, log path, and save path.

python test.py

Mask Propagation from A Single Frame

When you only annotate the object mask of one frame (or few frames), our method can propagate it to other frames automatically.

Modify ./config/train_mask.yml. We typically set the training iterations to 4,000 ~ 20,000, and the learning rate to 1e-5 ~ 1e-4.

python train_mask.py

After training, modify ./config/test_mask.yml, and then:

python test_mask.py

High-resolution Video Inpainting

Our 4K videos and mask annotations can be downloaded in 4K data.

More Results

Our results on 70 DAVIS videos (including failure cases) can be found here for your reference :)
If you need the PNG version of our uncompressed results, please contact the authors.

Citation

If you find this work useful for your research, please cite:

@inproceedings{ouyang2021video,
  title={Internal Video Inpainting by Implicit Long-range Propagation},
  author={Ouyang, Hao and Wang, Tengfei and Chen, Qifeng},
  booktitle={International Conference on Computer Vision (ICCV) },
  year={2021}
}

If you are also interested in the image inpainting or internal learning, this paper can be also helpful :)

@inproceedings{wang2021image,
  title={Image Inpainting with External-internal Learning and Monochromic Bottleneck},
  author={Wang, Tengfei and Ouyang, Hao and Chen, Qifeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5120--5129},
  year={2021}
}

Contact

Please send emails to Hao Ouyang or Tengfei Wang if there is any question

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Related tags

Overview

Implicit Internal Video Inpainting

Introduction

TO DO

Setup

Installation

Environment

Usage

Quick Start

Try Your Own Data

Data preprocess

Basic training

Improve the sharpness and consistency

Inference

Mask Propagation from A Single Frame

High-resolution Video Inpainting

More Results

Citation

Contact

Owner

Direct design of biquad filter cascades with deep learning by sampling random polynomials.

This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

PyTorch implementations of the NeRF model described in "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

Voice assistant - Voice assistant with python

paper: Hyperspectral Remote Sensing Image Classification Using Deep Convolutional Capsule Network

一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.

Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

ReAct: Out-of-distribution Detection With Rectified Activations

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

A human-readable PyTorch implementation of "Self-attention Does Not Need O(n^2) Memory"

Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

My implementation of Fully Convolutional Neural Networks in Keras

A High-Level Fusion Scheme for Circular Quantities published at the 20th International Conference on Advanced Robotics

The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning