PConv-Keras - Unofficial implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions". Try at: www.fixmyphoto.ai

Overview

Partial Convolutions for Image Inpainting using Keras

Keras implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions", https://arxiv.org/abs/1804.07723. A huge shoutout the authors Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao and Bryan Catanzaro from NVIDIA corporation for releasing this awesome paper, it's been a great learning experience for me to implement the architecture, the partial convolutional layer, and the loss functions.

Dependencies

  • Python 3.6
  • Keras 2.2.4
  • Tensorflow 1.12

How to use this repository

The easiest way to try a few predictions with this algorithm is to go to www.fixmyphoto.ai, where I've deployed it on a serverless React application with AWS lambda functions handling inference.

If you want to dig into the code, the primary implementations of the new PConv2D keras layer as well as the UNet-like architecture using these partial convolutional layers can be found in libs/pconv_layer.py and libs/pconv_model.py, respectively - this is where the bulk of the implementation can be found. Beyond this I've set up four jupyter notebooks, which details the several steps I went through while implementing the network, namely:

Step 1: Creating random irregular masks
Step 2: Implementing and testing the implementation of the PConv2D layer
Step 3: Implementing and testing the UNet architecture with PConv2D layers
Step 4: Training & testing the final architecture on ImageNet
Step 5: Simplistic attempt at predicting arbitrary image sizes through image chunking

Pre-trained weights

I've ported the VGG16 weights from PyTorch to keras; this means the 1/255. pixel scaling can be used for the VGG16 network similarly to PyTorch.

Training on your own dataset

You can either go directly to step 4 notebook, or alternatively use the CLI (make sure to download the converted VGG16 weights):

python main.py \
    --name MyDataset \
    --train TRAINING_PATH \
    --validation VALIDATION_PATH \
    --test TEST_PATH \
    --vgg_path './data/logs/pytorch_to_keras_vgg16.h5'

Implementation details

Details of the implementation are in the paper itself, however I'll try to summarize some details here.

Mask Creation

In the paper they use a technique based on occlusion/dis-occlusion between two consecutive frames in videos for creating random irregular masks - instead I've opted for simply creating a simple mask-generator function which uses OpenCV to draw some random irregular shapes which I then use for masks. Plugging in a new mask generation technique later should not be a problem though, and I think the end results are pretty decent using this method as well.

Partial Convolution Layer

A key element in this implementation is the partial convolutional layer. Basically, given the convolutional filter W and the corresponding bias b, the following partial convolution is applied instead of a normal convolution:

where ⊙ is element-wise multiplication and M is a binary mask of 0s and 1s. Importantly, after each partial convolution, the mask is also updated, so that if the convolution was able to condition its output on at least one valid input, then the mask is removed at that location, i.e.

The result of this is that with a sufficiently deep network, the mask will eventually be all ones (i.e. disappear)

UNet Architecture

Specific details of the architecture can be found in the paper, but essentially it's based on a UNet-like structure, where all normal convolutional layers are replace with partial convolutional layers, such that in all cases the image is passed through the network alongside the mask. The following provides an overview of the architecture.

Loss Function(s)

The loss function used in the paper is kinda intense, and can be reviewed in the paper. In short it includes:

  • Per-pixel losses both for maskes and un-masked regions
  • Perceptual loss based on ImageNet pre-trained VGG-16 (pool1, pool2 and pool3 layers)
  • Style loss on VGG-16 features both for predicted image and for computed image (non-hole pixel set to ground truth)
  • Total variation loss for a 1-pixel dilation of the hole region

The weighting of all these loss terms are as follows:

Training Procedure

Network was trained on ImageNet with a batch size of 1, and each epoch was specified to be 10,000 batches long. Training was furthermore performed using the Adam optimizer in two stages since batch normalization presents an issue for the masked convolutions (since mean and variance is calculated for hole pixels).

Stage 1 Learning rate of 0.0001 for 50 epochs with batch normalization enabled in all layers

Stage 2 Learning rate of 0.00005 for 50 epochs where batch normalization in all encoding layers is disabled.

Training time for shown images was absolutely crazy long, but that is likely because of my poor personal setup. The few tests I've tried on a 1080Ti (with batch size of 4) indicates that training time could be around 10 days, as specified in the paper.

Owner
Mathias Gruber
Chief Data Scientist
Mathias Gruber
The Codebase for Causal Distillation for Language Models.

Causal Distillation for Language Models Zhengxuan Wu*,Atticus Geiger*, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D.

Zen 20 Dec 31, 2022
PyTorch implementation of MSBG hearing loss model and MBSTOI intelligibility metric

PyTorch implementation of MSBG hearing loss model and MBSTOI intelligibility metric This repository contains the implementation of MSBG hearing loss m

BUT <a href=[email protected]"> 9 Nov 08, 2022
Code for the paper "Relation of the Relations: A New Formalization of the Relation Extraction Problem"

This repo contains the code for the EMNLP 2020 paper "Relation of the Relations: A New Paradigm of the Relation Extraction Problem" (Jin et al., 2020)

YYY 27 Oct 26, 2022
The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Chenyu 109 Dec 23, 2022
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

536 Jan 05, 2023
基于tensorflow 2.x的图片识别工具集

Classification.tf2 基于tensorflow 2.x的图片识别工具集 功能 粗粒度场景图片分类 细粒度场景图片分类 其他场景图片分类 模型部署 tensorflow serving本地推理和docker部署 tensorRT onnx ... 数据集 https://hyper.a

Wei Qi 1 Nov 03, 2021
use machine learning to recognize gesture on raspberrypi

Raspberrypi_Gesture-Recognition use machine learning to recognize gesture on raspberrypi 說明 利用 tensorflow lite 訓練手部辨識模型 分辨 "剪刀"、"石頭"、"布" 之手勢 再將訓練模型匯入

1 Dec 10, 2021
SLAMP: Stochastic Latent Appearance and Motion Prediction

SLAMP: Stochastic Latent Appearance and Motion Prediction Official implementation of the paper SLAMP: Stochastic Latent Appearance and Motion Predicti

Kaan Akan 34 Dec 08, 2022
Analysing poker data from home games with friends

Poker Game Analysis Analysing poker data from home games with friends. Not a lot of data is collected, so this project is primarily focussed on descri

Stavros Karmaniolos 1 Oct 15, 2022
Materials for upcoming beginner-friendly PyTorch course (work in progress).

Learn PyTorch for Deep Learning (work in progress) I'd like to learn PyTorch. So I'm going to use this repo to: Add what I've learned. Teach others in

Daniel Bourke 2.3k Dec 29, 2022
PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

WonderTree 7 Jul 20, 2021
Code for models used in Bashiri et al., "A Flow-based latent state generative model of neural population responses to natural images".

A Flow-based latent state generative model of neural population responses to natural images Code for "A Flow-based latent state generative model of ne

Sinz Lab 5 Aug 26, 2022
Net2net - Network-to-Network Translation with Conditional Invertible Neural Networks

Net2Net Code accompanying the NeurIPS 2020 oral paper Network-to-Network Translation with Conditional Invertible Neural Networks Robin Rombach*, Patri

CompVis Heidelberg 206 Dec 20, 2022
Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

BlobGAN: Spatially Disentangled Scene Representations Official PyTorch Implementation Paper | Project Page | Video | Interactive Demo BlobGAN.mp4 This

148 Dec 29, 2022
Deep Learning Package based on TensorFlow

White-Box-Layer is a Python module for deep learning built on top of TensorFlow and is distributed under the MIT license. The project was started in M

YeongHyeon Park 7 Dec 27, 2021
Resilience from Diversity: Population-based approach to harden models against adversarial attacks

Resilience from Diversity: Population-based approach to harden models against adversarial attacks Requirements To install requirements: pip install -r

0 Nov 23, 2021
Code of the paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodner and Joachim Denzler

Part Detector Discovery This is the code used in our paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodne

Computer Vision Group Jena 17 Feb 22, 2022
Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

Awesome AI books Some awesome AI related books and pdfs for downloading and learning. Preface This repo only used for learning, do not use in business

luckyzhou 1k Jan 01, 2023
[CVPR 2021] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (CVPR 2021) arXiv Prerequisite PyTorch = 1.2.0 Python3 torchvision PIL argpar

51 Nov 11, 2022
Pytorch implementation of One-Shot Affordance Detection

One-shot Affordance Detection PyTorch implementation of our one-shot affordance detection models. This repository contains PyTorch evaluation code, tr

46 Dec 12, 2022