Semantic Bottleneck Scene Generation

Related tags

Deep LearningSB-GAN
Overview

SB-GAN

Semantic Bottleneck Scene Generation

Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes. We assume pixel-wise segmentation labels are available during training and use them to learn the scene structure. During inference, our model first synthesizes a realistic segmentation layout from scratch, then synthesizes a realistic scene conditioned on that layout. For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts. For the latter, we use a conditional segmentation-to-image synthesis network that captures the distribution of photo-realistic images conditioned on the semantic layout. When trained end-to-end, the resulting model outperforms state-of-the-art generative models in unsupervised image synthesis on two challenging domains in terms of the Frechet Inception Distance and user-study evaluations. Moreover, we demonstrate the generated segmentation maps can be used as additional training data to strongly improve recent segmentation-to-image synthesis networks.

Paper

[Paper 3.5MB]  [arXiv]

Code

Prerequisites:

  • NVIDIA GPU + CUDA CuDNN
  • Python 3.6
  • PyTorch 1.0
  • Please install dependencies by
pip install -r requirements.txt

Preparation

  • Clone this repo with its submodules
git clone --recurse-submodules -j8 https://github.com/azadis/SB-GAN.git
cd SB-GAN/SPADE/models/networks/
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
cd ../../../../

Datasets

ADE-Indoor

  • To have access to the indoor images from the ADE20K dataset and their corresponding segmentation maps used in our paper:
cd SB-GAN
bash SBGAN/datasets/download_ade.sh
cd ..

Cityscapes

cd SB-GAN/SBGAN/datasets
mkdir cityscapes
cd cityscapes
  • Download and unzip leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip from the Cityscapes webpage .
mv leftImg8bit_trainvaltest/leftImg8bit ./
mv gtFine_trainvaltest/gtFine ./

Cityscapes-25k

  • In addition to the 5K portion already downloaded, download and unzip leftImg8bit_trainextra.zip. You can have access to the fine annotations of these 20K images we used in our paper by:
wget https://people.eecs.berkeley.edu/~sazadi/SBGAN/datasets/drn_d_105_000_test.tar.gz
tar -xzvf drn_d_105_000_test.tar.gz

These annotations are predicted by a DRN trained on the 5K fine-annotated portion of Cityscapes with 19 semantic categories. The new fine annotations of the 5K portion with 19 semantic classes can be also downloaded by:

wget https://people.eecs.berkeley.edu/~sazadi/SBGAN/datasets/gtFine_new.tar.gz
tar -xzvf gtFine_new.tar.gz
cd ../../../..

Training

cd SB-GAN/SBGAN

  • On each $dataset in ade_indoor, cityscapes, cityscapes_25k:
  1. Semantic bottleneck synthesis:
bash SBGAN/scipts/$dataset/train_progressive_seg.sh
  1. Semantic image synthesis:
cd ../SPADE
bash scripts/$dataset/train_spade.sh
  1. Train the end2end SBGAN model:
cd ../SBGAN
bash SBGAN/scripts/$dataset/train_finetune_end2end.sh
  • In the above script, set $pro_iter to the iteration number of the checkpoint saved from step 1 that you want to use before fine-tuning. Also, set $spade_epoch to the last epoch saved for SPADE from step 2.
  • To visualize the training you have started in steps 1 and 3 on a ${date-time}, run the following commands. Then, open http://localhost:6006/ on your web browser.
cd SBGAN/logs/${date-time}
tensorboard --logdir=. --port=6006

Testing

To compute FID after training the end2end model, for each $dataset, do:

bash SBGAN/scripts/$dataset/test_finetune_end2end.sh
  • In the above script, set $pro_iter and $spade_epoch to the appropriate checkpoints saved from your end2end training.

Citation

If you use this code, please cite our paper:

@article{azadi2019semantic,
  title={Semantic Bottleneck Scene Generation},
  author={Azadi, Samaneh and Tschannen, Michael and Tzeng, Eric and Gelly, Sylvain and Darrell, Trevor and Lucic, Mario},
  journal={arXiv preprint arXiv:1911.11357},
  year={2019}
}
Owner
Samaneh Azadi
CS PhD student at UC Berkeley
Samaneh Azadi
MINOS: Multimodal Indoor Simulator

MINOS Simulator MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environ

194 Dec 27, 2022
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Easy-to-use toolkit for retrieval-based Chatbot Recent Activity Our released RRS corpus can be found here. Our released BERT-FP post-training checkpoi

GMFTBY 32 Nov 13, 2022
Large scale PTM - PPI relation extraction

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT The silver standard

1 Feb 25, 2022
Learning Off-Policy with Online Planning, CoRL 2021

LOOP: Learning Off-Policy with Online Planning Accepted in Conference of Robot Learning (CoRL) 2021. Harshit Sikchi, Wenxuan Zhou, David Held Paper In

Harshit Sikchi 24 Nov 22, 2022
Jittor implementation of PCT:Point Cloud Transformer

PCT: Point Cloud Transformer This is a Jittor implementation of PCT: Point Cloud Transformer.

MenghaoGuo 547 Jan 03, 2023
Official code repository for "Exploring Neural Models for Query-Focused Summarization"

Query-Focused Summarization Official code repository for "Exploring Neural Models for Query-Focused Summarization" This is a work in progress. Expect

Salesforce 29 Dec 18, 2022
[ECCV 2020] XingGAN for Person Image Generation

Contents XingGAN or CrossingGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowl

Hao Tang 218 Oct 29, 2022
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 04, 2023
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dea

MIC-DKFZ 1.2k Jan 04, 2023
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 187 Jan 06, 2023
FasterAI: A library to make smaller and faster models with FastAI.

Fasterai fasterai is a library created to make neural network smaller and faster. It essentially relies on common compression techniques for networks

Nathan Hubens 193 Jan 01, 2023
Code for ViTAS_Vision Transformer Architecture Search

Vision Transformer Architecture Search This repository open source the code for ViTAS: Vision Transformer Architecture Search. ViTAS aims to search fo

46 Dec 17, 2022
Bayesian Deep Learning and Deep Reinforcement Learning for Object Shape Error Response and Correction of Manufacturing Systems

Bayesian Deep Learning for Manufacturing 2.0 (dlmfg) Object Shape Error Response (OSER) Digital Lifecycle Management - In Process Quality Improvement

Sumit Sinha 30 Oct 31, 2022
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 6 Sep 07, 2022
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Hailo Model Zoo The Hailo Model Zoo provides pre-trained models for high-performance deep learning applications. Using the Hailo Model Zoo you can mea

Hailo 50 Dec 07, 2022
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

THUDM 562 Dec 27, 2022
Create and implement a deep learning library from scratch.

In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The Proj

Rishabh Bali 22 Aug 23, 2022