StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Last update: Dec 21, 2022

Related tags

Overview

StackGAN

Tensorflow implementation for reproducing main results in the paper StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas.

Dependencies

python 2.7

TensorFlow 0.12

[Optional] Torch is needed, if use the pre-trained char-CNN-RNN text encoder.

[Optional] skip-thought is needed, if use the skip-thought text encoder.

In addition, please add the project folder to PYTHONPATH and pip install the following packages:

prettytensor
progressbar
python-dateutil
easydict
pandas
torchfile

Data

Download our preprocessed char-CNN-RNN text embeddings for birds and flowers and save them to Data/.

[Optional] Follow the instructions reedscot/icml2016 to download the pretrained char-CNN-RNN text encoders and extract text embeddings.

Download the birds and flowers image data. Extract them to Data/birds/ and Data/flowers/, respectively.
Preprocess images.

For birds: python misc/preprocess_birds.py
For flowers: python misc/preprocess_flowers.py

Training

The steps to train a StackGAN model on the CUB dataset using our preprocessed data for birds.
- Step 1: train Stage-I GAN (e.g., for 600 epochs) python stageI/run_exp.py --cfg stageI/cfg/birds.yml --gpu 0
- Step 2: train Stage-II GAN (e.g., for another 600 epochs) python stageII/run_exp.py --cfg stageII/cfg/birds.yml --gpu 1
Change birds.yml to flowers.yml to train a StackGAN model on Oxford-102 dataset using our preprocessed data for flowers.
*.yml files are example configuration files for training/testing our models.
If you want to try your own datasets, here are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.

Pretrained Model

StackGAN for birds trained from char-CNN-RNN text embeddings. Download and save it to models/.
StackGAN for flowers trained from char-CNN-RNN text embeddings. Download and save it to models/.
StackGAN for birds trained from skip-thought text embeddings. Download and save it to models/ (Just used the same setting as the char-CNN-RNN. We assume better results can be achieved by playing with the hyper-parameters).

Run Demos

Run sh demo/flowers_demo.sh to generate flower samples from sentences. The results will be saved to Data/flowers/example_captions/. (Need to download the char-CNN-RNN text encoder for flowers to models/text_encoder/. Note: this text encoder is provided by reedscot/icml2016).
Run sh demo/birds_demo.sh to generate bird samples from sentences. The results will be saved to Data/birds/example_captions/.(Need to download the char-CNN-RNN text encoder for birds to models/text_encoder/. Note: this text encoder is provided by reedscot/icml2016).
Run python demo/birds_skip_thought_demo.py --cfg demo/cfg/birds-skip-thought-demo.yml --gpu 2 to generate bird samples from sentences. The results will be saved to Data/birds/example_captions-skip-thought/. (Need to download vocabulary for skip-thought vectors to Data/skipthoughts/).

Examples for birds (char-CNN-RNN embeddings), more on youtube:

Examples for flowers (char-CNN-RNN embeddings), more on youtube:

Save your favorite pictures generated by our models since the randomness from noise z and conditioning augmentation makes them creative enough to generate objects with different poses and viewpoints from the same discription 😃

Citing StackGAN

If you find StackGAN useful in your research, please consider citing:

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

Our follow-up work

References

Generative Adversarial Text-to-Image Synthesis Paper Code
Learning Deep Representations of Fine-grained Visual Descriptions Paper Code

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Related tags

Overview

StackGAN

Dependencies

Citing StackGAN

Owner

Han Zhang

Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

Implementing Vision Transformer (ViT) in PyTorch

A framework for annotating 3D meshes using the predictions of a 2D semantic segmentation model.

Simulating Sycamore quantum circuits classically using tensor network algorithm.

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Oral)

This repository is a basic Machine Learning train & validation Template (Using PyTorch)

This repository contains demos I made with the Transformers library by HuggingFace.

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility ICCV2021

Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

Code for "Layered Neural Rendering for Retiming People in Video."

A library for finding knowledge neurons in pretrained transformer models.

Self-Guided Contrastive Learning for BERT Sentence Representations

Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision

Official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning (ICML 2021) published at International Conference on Machine Learning

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Pomodoro timer that acknowledges the inexorable, infinite passage of time