Generative Adversarial Text-to-Image Synthesis

Last update: Dec 31, 2022

Related tags

Deep Learning icml2016

Overview

###Generative Adversarial Text-to-Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee

This is the code for our ICML 2016 paper on text-to-image synthesis using conditional GANs. You can use it to train and sample from text-to-image models. The code is adapted from the excellent dcgan.torch.

####Setup Instructions

You will need to install Torch, CuDNN, and the display package.

####How to train a text to image model:

Download the birds and flowers and COCO caption data in Torch format.
Download the birds and flowers and COCO image data.
Download the text encoders for birds and flowers and COCO descriptions.
Modify the CONFIG file to point to your data and text encoder paths.
Run one of the training scripts, e.g. ./scripts/train_cub.sh

####How to generate samples:

For flowers: ./scripts/demo_flowers.sh. Add text descriptions to scripts/flowers_queries.txt.
For birds: ./scripts/demo_cub.sh.
For COCO (more general images): ./scripts/demo_coco.sh.
An html file will be generated with the results:

####Pretrained models:

####How to train a text encoder from scratch:

You may want to do this if you have your own new dataset of text descriptions.
For flowers and birds: follow the instructions here.
For MS-COCO: ./scripts/train_coco_txt.sh.

####Citation

If you find this useful, please cite our work as follows:

@inproceedings{reed2016generative,
  title={Generative Adversarial Text-to-Image Synthesis},
  author={Scott Reed and Zeynep Akata and Xinchen Yan and Lajanugen Logeswaran and Bernt Schiele and Honglak Lee},
  booktitle={Proceedings of The 33rd International Conference on Machine Learning},
  year={2016}
}

Generative Adversarial Text-to-Image Synthesis

Related tags

Overview

Owner

Scott Ellison Reed

Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

ANN model for prediction a spatio-temporal distribution of supercooled liquid in mixed-phase clouds using Doppler cloud radar spectra.

Simple and Distributed Machine Learning

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

State of the Art Neural Networks for Generative Deep Learning

Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

A self-supervised 3D representation learning framework named viewpoint bottleneck.

A whale detector design for the Kaggle whale-detector challenge!

Official page of Patchwork (RA-L'21 w/ IROS'21)

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

Multi-Task Learning as a Bargaining Game

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目

这是一个yolo3-tf2的源码，可以用于训练自己的模型。

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

Testability-Aware Low Power Controller Design with Evolutionary Learning, ITC2021

CT-Net: Channel Tensorization Network for Video Classification

lightweight python wrapper for vowpal wabbit