Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Overview

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependency repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

Download a VQGAN model and put it in the ./models folder.

Dataset Link
ImageNet (f=16), 16384 vqgan_imagenet_f16_16384

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

  • 6 GB of VRAM is required to generate 256x256 images.
  • 11 GB of VRAM is required to generate 512x512 images.
  • 24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Two configuration file are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

The resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument Type Descriptions
prompts List[str] Text prompts
image_prompts List[FilePath] Image prompts / target image path
max_iterations int Number of iterations
save_freq int Save image iterations
size [int, int] Image size (width height)
init_image FilePath Initial image
init_noise str Initial noise image ['gradient','pixels']
init_weight float Initial weight
output_dir FilePath Path to output directory
models_dir FilePath Path to models cache directory
clip_model FilePath CLIP model path or name
vqgan_checkpoint FilePath VQGAN checkpoint path
vqgan_config FilePath VQGAN config path
noise_prompt_seeds List[int] Noise prompt seeds
noise_prompt_weights List[float] Noise prompt weights
step_size float Learning rate
cutn int Number of cuts
cut_pow float Cut power
seed int Seed (-1 for random seed)
optimizer str Optimiser ['Adam','AdamW','Adagrad','Adamax','DiffGrad','AdamP','RAdam']
augments List[str] Enabled augments ['Ji','Sh','Gn','Pe','Ro','Af','Et','Ts','Cr','Er','Re']

Acknowledgments

VQGAN+CLIP

Taming Transformers

CLIP

DALLE-PyTorch

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Owner
Kevin Costa
Machine Learning Engineer. Previously Student @ 42 Paris
Kevin Costa
Simulator for FRC 2022 challenge: Rapid React

rrsim Simulator for FRC 2022 challenge: Rapid React out-1.mp4 Usage In order to run the simulator use the following: python3 rrsim.py [config_path] wh

1 Jan 18, 2022
GyroSPD: Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices

GyroSPD Code for the paper "Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices" accepted at NeurIPS 2021. Re

Federico Lopez 12 Dec 12, 2022
MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets) Using mixup data augmentation as reguliraztion and tuning the hyper par

Bhanu 2 Jan 16, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

24 Dec 28, 2022
This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)

2.4k Jan 08, 2023
[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

hawkey 78 Dec 27, 2022
Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

One Pixel Attack How simple is it to cause a deep neural network to misclassify an image if an attacker is only allowed to modify the color of one pix

Dan Kondratyuk 1.2k Dec 26, 2022
Composing methods for ML training efficiency

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training.

MosaicML 2.8k Jan 08, 2023
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ User support: lambeq-su

Cambridge Quantum 315 Jan 01, 2023
PyTorch implementation of adversarial patch

adversarial-patch PyTorch implementation of adversarial patch This is an implementation of the Adversarial Patch paper. Not official and likely to hav

Jamie Hayes 172 Nov 29, 2022
Codes for the AAAI'22 paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning"

TransZero [arXiv] This repository contains the testing code for the paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning" accepted to

Shiming Chen 52 Jan 01, 2023
Code for the Convolutional Vision Transformer (ConViT)

ConViT : Vision Transformers with Convolutional Inductive Biases This repository contains PyTorch code for ConViT. It builds on code from the Data-Eff

Facebook Research 418 Jan 06, 2023
Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset Description This dataset contains the six different classes as described in our paper[]. The

0 Dec 16, 2021
Recognize Handwritten Digits using Deep Learning on the browser itself.

MNIST on the Web An attempt to predict MNIST handwritten digits from my PyTorch model from the browser (client-side) and not from the server, with the

Harjyot Bagga 7 May 28, 2022
Implement A3C for Mujoco gym envs

pytorch-a3c-mujoco Disclaimer: my implementation right now is unstable (you ca refer to the learning curve below), I'm not sure if it's my problems. A

Andrew 70 Dec 12, 2022
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

34 Dec 31, 2022
[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples This repository is the official implementation of "Tow

Sungyoon Lee 4 Jul 12, 2022
Fuse radar and camera for detection

SAF-FCOS: Spatial Attention Fusion for Obstacle Detection using MmWave Radar and Vision Sensor This project hosts the code for implementing the SAF-FC

ChangShuo 18 Jan 01, 2023
A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

PixelSSL is a PyTorch-based semi-supervised learning (SSL) codebase for pixel-wise (Pixel) vision tasks. The purpose of this project is to promote the

Zhanghan Ke 255 Dec 11, 2022
Cache Requests in Deta Bases and Echo them with Deta Micros

Deta Echo Cache Leverage the awesome Deta Micros and Deta Base to cache requests and echo them as needed. Stop worrying about slow public APIs or agre

Gingerbreadfork 8 Dec 07, 2021