Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Last update: Dec 28, 2022

Overview

Make-A-Scene - PyTorch

Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/pdf/2203.13131.pdf)

Figure 1. from paper

Note: this is work in progress.

Everyone is happily invited to contribute --> Discord Channel: https://discord.gg/hCRMGRZkC6

We would love to open-source a trained model. The model is a billion parameter model. Training it requires a lot of compute. If anyone can provide computational resources, let us know.

Paper Description:

Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:

Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
Classifier Guidance for the autoregressive transformer (3.7)

Training Pipeline

Figure 6. from paper

What needs to be done?

Refer to the different folders to see details.

Citation

@misc{https://doi.org/10.48550/arxiv.2203.13131,
  doi = {10.48550/ARXIV.2203.13131},
  url = {https://arxiv.org/abs/2203.13131},
  author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
  title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Related tags

Overview

Make-A-Scene - PyTorch

Note: this is work in progress.

Paper Description:

Training Pipeline

What needs to be done?

Citation

Owner

Casual GAN Papers

Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

kullanışlı ve işinizi kolaylaştıracak bir araç

Equivariant GNN for the prediction of atomic multipoles up to quadrupoles.

Several simple examples for popular neural network toolkits calling custom CUDA operators.

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

A Survey on Deep Learning Technique for Video Segmentation

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

SplineConv implementation for Paddle.

PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

A list of multi-task learning papers and projects.

g9.py - Torch interactive graphics

A community run, 5-day PyTorch Deep Learning Bootcamp

A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

Invasive Plant Species Identification

Multi-Stage Progressive Image Restoration

A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

Reinforcement Learning Theory Book (rus)

Baselines for TrajNet++