NP DRAW paper released code

Related tags

Deep LearningNPDRAW
Overview

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation

This repo contains the official implementation for the NP-DRAW paper.

by Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, and Renjie Liao

Abstract

In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows.

  1. We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable “what-to-draw” per step becomes a categorical random variable. This improves the expressiveness and greatly eases the learning compared to Gaussians used in the literature.
  2. We model the sequential dependency structure of parts via a Transformer, which is more powerful and easier to train compared to RNNs used in the literature.
  3. We propose an effective heuristic parsing algorithm to pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show that our method significantly outperforms previous structured image models like DRAW and AIR and is competitive to other generic generative models.

Moreover, we show that our model’s inherent compositionality and interpretability bring significant benefits in the low-data learning regime and latent space editing.

Generation Process

prior

Our prior generate "whether", "where" and "what" to draw per step. If the "whether-to-draw" is true, a patch from the part bank is selected and pasted to the canvas. The final canvas is refined by our decoder.

More visualization of the canvas and images

twitter-1page

Latent Space Editting

We demonstrate the advantage of our interpretable latent space via interactively editing/composing the latent canvas.

edit

  • Given images A and B, we encode them to obtain the latent canvases. Then we compose a new canvas by placing certain semantically meaningful parts (e.g., eyeglasses, hair, beard, face) from canvas B on top of canvas A. Finally, we decode an image using the composed canvas.

Dependencies

# the following command will install torch 1.6.0 and other required packages 
conda env create -f environment.yml # edit the last link in the yml file for the directory
conda activate npdraw 

Pretrained Model

Pretrained model will be available here To use the pretrained models, download the zip file under exp folder and unzip it. For expample, with the cifar.zip file we will get ./exp/cifarcm/cat_vloc_at/ and ./exp/cnn_prior/cifar/.

Testing the pretrained NPDRAW model:

  • before running the evaluation, please also download the stats on the test set from google-drive, and run
mkdir datasets 
mv images.tar.gz datasets 
cd datasets 
tar xzf images.tar.gz 

The following commands test the FID score of the NPDRAW model.

# for mnist
bash scripts/local_sample.sh exp/stoch_mnist/cat_vloc_at/0208/p5s5n36vitBinkl1r1E3_K50w5sc0_gs_difflr_b500/E00550.pth # FID 2.55

# for omniglot
bash scripts/local_sample.sh exp/omni/cat_vloc_at/0208/p5s5n36vitBinkl1r1E3_K50w5sc0_gs_difflr_b500/ckpt_epo799.pth # FID 5.53

# for cifar
bash scripts/local_sample.sh exp/cifarcm/cat_vloc_at/0208/p4s4n64_vitcnnLkl11E3_K200w4sc0_gs_difflr_b150/ckpt_epo499.pth #

# for celeba
bash scripts/local_sample.sh exp/celebac32/cat_vloc_at/0208/p4s4n64_vitcnnLkl0e531E3_K200w4sc0_gs_difflr_b150/ckpt_epo199.pth # FID 41.29

Training

Use ./scripts/train_$DATASET.sh to train the model.


  • The code in tool/pytorch-fid/ is adapated from here
  • The transformer code is adapted from here
Owner
ZENG Xiaohui
ZENG Xiaohui
Framework for training options with different attention mechanism and using them to solve downstream tasks.

Using Attention in HRL Framework for training options with different attention mechanism and using them to solve downstream tasks. Requirements GPU re

5 Nov 03, 2022
Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

NIRPS-ETC Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph February 2

Nolan Grieves 2 Sep 15, 2022
TensorFlow implementation of Deep Reinforcement Learning papers

Deep Reinforcement Learning in TensorFlow TensorFlow implementation of Deep Reinforcement Learning papers. This implementation contains: [1] Playing A

Taehoon Kim 1.6k Jan 03, 2023
DiAne is a smart fuzzer for IoT devices

Diane Diane is a fuzzer for IoT devices. Diane works by identifying fuzzing triggers in the IoT companion apps to produce valid yet under-constrained

seclab 28 Jan 04, 2023
A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

IconQA About IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and c

Pan Lu 24 Dec 30, 2022
Official implementation of "Robust channel-wise illumination estimation"

This repository provides the official implementation of "Robust channel-wise illumination estimation." accepted in BMVC (2021).

Firas Laakom 4 Nov 08, 2022
Using NumPy to solve the equations of fluid mechanics together with Finite Differences, explicit time stepping and Chorin's Projection methods

Computational Fluid Dynamics in Python Using NumPy to solve the equations of fluid mechanics 🌊 🌊 🌊 together with Finite Differences, explicit time

Felix Köhler 4 Nov 12, 2022
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

124 Dec 27, 2022
A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Protecting your privacy using an ESP32, an IR sensor and a python script This project, which I personally call the "never-gonna-catch-me-in-the-act-ev

8 Oct 10, 2022
RARA: Zero-shot Sim2Real Visual Navigation with Following Foreground Cues

RARA: Zero-shot Sim2Real Visual Navigation with Following Foreground Cues FGBG (foreground-background) pytorch package for defining and training model

Klaas Kelchtermans 1 Jun 02, 2022
A repository for benchmarking neural vocoders by their quality and speed.

License The majority of VocBench is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Wavenet, Para

Meta Research 177 Dec 12, 2022
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

halo 368 Dec 06, 2022
Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

Few-shot face translation A GAN based approach for one model to swap them all. The table below shows our priliminary face-swapping results requiring o

768 Dec 24, 2022
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data (NeurIPS 2021) This repository will provide the official PyTorch implementa

Liming Jiang 238 Nov 25, 2022
Patches desktop steam to look like the new steamdeck ui.

steam_deck_ui_patch The Deck UI patch will patch the regular desktop steam to look like the brand new SteamDeck UI. This patch tool currently works on

The_IT_Dude 3 Aug 29, 2022
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Phil Wang 383 Jan 02, 2023
Code for Domain Adaptive Video Segmentation via Temporal Consistency Regularization in ICCV 2021

Domain Adaptive Video Segmentation via Temporal Consistency Regularization Updates 08/2021: check out our domain adaptation for sematic segmentation p

36 Dec 12, 2022
Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM) Introduction The average lifetime of the $D^{0}$ me

Son Gyo Jung 1 Dec 17, 2021
Implementing DeepMind's Fast Reinforcement Learning paper

Fast Reinforcement Learning This is a repo where I implement the algorithms in the paper, Fast reinforcement learning with generalized policy updates.

Marcus Chiam 6 Nov 28, 2022