CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Overview

Diverse Structure Inpainting

ArXiv | Papar | Supplementary Material | BibTex

This repository is for the CVPR 2021 paper, "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE".

If our method is useful for your research, please consider citing.

Introduction

(Top) Input incomplete image, where the missing region is depicted in gray. (Middle) Visualization of the generated diverse structures. (Bottom) Output images of our method.

Places2 Results

Results on the Places2 validation set using the center-mask Places2 model.

CelebA-HQ Results

Results on one CelebA-HQ test image with different holes using the random-mask CelebA-HQ model.

Installation

This code was tested with TensorFlow 1.12.0 (later versions may work, excluding 2.x), CUDA 9.0, Python 3.6 and Ubuntu 16.04

Clone this repository๏ผš

git clone https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.git

Datasets

  • CelebA-HQ: the high-resolution face images from Growing GANs. 24183 images for training, 2993 images for validation and 2824 images for testing.
  • Places2: the challenge data from 365 scene categories. 8 Million images for training, 36K images for validation and 328K images for testing.
  • ImageNet: the data from 1000 natural categories. 1 Million images for training and 50K images for validation.

Training

  • Collect the dataset. For CelebA-HQ, we collect the 1024x1024 version. For Places2 and ImageNet, we collect the original version.
  • Prepare the file list. Collect the path of each image and make a file, where each line is a path (end with a carriage return except the last line).
  • Modify checkpoints_dir, dataset, train_flist and valid_flist arguments in train_vqvae.py, train_structure_generator.py and train_texture_generator.py.
  • Modify data/data_loader.py according to the dataset. For CelebA-HQ, we resize each image to 266x266 and randomly crop a 256x256. For Places2 and ImageNet, we randomly crop a 256x256
  • Run python train_vqvae.py to train VQ-VAE.
  • Modify vqvae_network_dir argument in train_structure_generator.py and train_texture_generator.py based on the path of pre-trained VQ-VAE.
  • Modify the mask setting arguments in train_structure_generator.py and train_texture_generator.py to choose center mask or random mask.
  • Run python train_structure_generator.py to train the structure generator.
  • Run python train_texture_generator.py to train the texture generator.
  • Modify structure_generator_dir and texture_generator_dir arguments in save_full_model.py based on the paths of pre-trained structure generator and texture generator.
  • Run python save_full_model.py to save the whole model.

Testing

  • Collect the testing set. For CelebA-HQ, we resize each image to 256x256. For Places2 and ImageNet, we crop a center 256x256.
  • Collect the corresponding mask set (2D grayscale, 0 indicates the known region, 255 indicates the missing region).
  • Prepare the img file list and the mask file list as training.
  • Modify checkpoints_dir, dataset, img_flist and mask_flist arguments in test.py.
  • Download the pre-trained model and put model.ckpt.meta, model.ckpt.index, model.ckpt.data-00000-of-00001 and checkpoint under model_logs/ directory.
  • Run python test.py

Pre-trained Models

Download the pre-trained models using the following links and put them under model_logs/ directory.

The center_mask models are trained with images of 256x256 resolution with center 128x128 holes. The random_mask models are trained with random regular and irregular holes.

Inference Time

One advantage of GAN-based and VAE-based methods is their fast inference speed. We measure that Mutual Encoder-Decoder with Feature Equalizations runs at 0.2 second per image on a single NVIDIA 1080 Ti GPU for images of resolution 256ร—256. In contrast, our model runs at 45 seconds per image. Naively sampling our autoregressive network is the major source of computational time. Fortunately, this time can be reduced by an order of magnitude using an incremental sampling technique which caches and reuses intermediate states of the network. Consider using this technique for faster inference.

[IEEE TPAMI21] MobileSal: Extremely Efficient RGB-D Salient Object Detection [PyTorch & Jittor]

MobileSal IEEE TPAMI 2021: MobileSal: Extremely Efficient RGB-D Salient Object Detection This repository contains full training & testing code, and pr

Yu-Huan Wu 52 Jan 06, 2023
Regulatory Instruments for Fair Personalized Pricing.

Fair pricing Source code for WWW 2022 paper Regulatory Instruments for Fair Personalized Pricing. Installation Requirements Linux with Python = 3.6 p

Renzhe Xu 6 Oct 26, 2022
Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Official PyTorch implementation of the paper: "Self-Supervised Relational Reasoning for Representation Learning" (2020), Patacchiola, M., and Storkey,

Massimiliano Patacchiola 135 Jan 03, 2023
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

22 Dec 11, 2022
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

๐Ÿฏ SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

Clova AI Research 256 Jan 05, 2023
Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Yixuan Su 61 Jan 03, 2023
Editing a classifier by rewriting its prediction rules

This repository contains the code and data for our paper: Editing a classifier by rewriting its prediction rules Shibani Santurkar*, Dimitris Tsipras*

Madry Lab 86 Dec 27, 2022
Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala,

Stefan Krastanov 1 Jan 17, 2022
NumPy๋กœ ๊ตฌํ˜„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. (์ž๋™ ๋ฏธ๋ถ„ ์ง€์›)

Deep Learning Library only using NumPy ๋ณธ ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋Š” NumPy ๋งŒ์œผ๋กœ ๊ตฌํ˜„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. ์ž๋™ ๋ฏธ๋ถ„์ด ๊ตฌํ˜„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ž๋™ ๋ฏธ๋ถ„ ์ž๋™ ๋ฏธ๋ถ„์€ ๋ฏธ๋ถ„์„ ์ž๋™์œผ๋กœ ๊ณ„์‚ฐํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ๋Š” ์ž๋™ ๋ฏธ๋ถ„์„ ํ™œ์šฉํ•ด ์—ญ์ „ํŒŒ

์กฐ์ค€ํฌ 17 Aug 16, 2022
WRENCH: Weak supeRvision bENCHmark

๐Ÿ”ง What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
Codebase for Time-series Generative Adversarial Networks (TimeGAN)

Codebase for Time-series Generative Adversarial Networks (TimeGAN)

Jinsung Yoon 532 Dec 31, 2022
Code for the paper "Next Generation Reservoir Computing"

Next Generation Reservoir Computing This is the code for the results and figures in our paper "Next Generation Reservoir Computing". They are written

OSU QuantInfo Lab 105 Dec 20, 2022
A keras implementation of ENet (abandoned for the foreseeable future)

ENet-keras This is an implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from ENet-training (lua-t

Pavlos 115 Nov 23, 2021
๐Ÿ”ฅ TensorFlow Code for technical report: "YOLOv3: An Incremental Improvement"

๐Ÿ†• Are you looking for a new YOLOv3 implemented by TF2.0 ? If you hate the fucking tensorflow1.x very much, no worries! I have implemented a new YOLOv

3.6k Dec 26, 2022
A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Mugs: A Multi-Granular Self-Supervised Learning Framework This is a PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-

Sea AI Lab 62 Nov 08, 2022
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8.1k Jan 02, 2023
Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

ApproxMVBB Status Build UnitTests Homepage Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in

Gabriel Nรผtzi 390 Dec 31, 2022