Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Overview

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependency repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

Download a VQGAN model and put it in the ./models folder.

Dataset Link
ImageNet (f=16), 16384 vqgan_imagenet_f16_16384

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

  • 6 GB of VRAM is required to generate 256x256 images.
  • 11 GB of VRAM is required to generate 512x512 images.
  • 24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Two configuration file are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

The resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument Type Descriptions
prompts List[str] Text prompts
image_prompts List[FilePath] Image prompts / target image path
max_iterations int Number of iterations
save_freq int Save image iterations
size [int, int] Image size (width height)
init_image FilePath Initial image
init_noise str Initial noise image ['gradient','pixels']
init_weight float Initial weight
output_dir FilePath Path to output directory
models_dir FilePath Path to models cache directory
clip_model FilePath CLIP model path or name
vqgan_checkpoint FilePath VQGAN checkpoint path
vqgan_config FilePath VQGAN config path
noise_prompt_seeds List[int] Noise prompt seeds
noise_prompt_weights List[float] Noise prompt weights
step_size float Learning rate
cutn int Number of cuts
cut_pow float Cut power
seed int Seed (-1 for random seed)
optimizer str Optimiser ['Adam','AdamW','Adagrad','Adamax','DiffGrad','AdamP','RAdam']
augments List[str] Enabled augments ['Ji','Sh','Gn','Pe','Ro','Af','Et','Ts','Cr','Er','Re']

Acknowledgments

VQGAN+CLIP

Taming Transformers

CLIP

DALLE-PyTorch

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Owner
Kevin Costa
Machine Learning Engineer. Previously Student @ 42 Paris
Kevin Costa
Romanian Automatic Speech Recognition from the ROBIN project

RobinASR This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, tog

RACAI 10 Jan 01, 2023
This is an implementation for the CVPR2020 paper "Learning Invariant Representation for Unsupervised Image Restoration"

Learning Invariant Representation for Unsupervised Image Restoration (CVPR 2020) Introduction This is an implementation for the paper "Learning Invari

GarField 88 Nov 07, 2022
Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

All about AI with Cheat-Sheets(+100 Cheat-sheets), Free Online Books, Courses, Videos and Lectures, Papers, Tutorials, Researchers, Websites, Datasets

Niraj Lunavat 1.2k Jan 01, 2023
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
Video-Music Transformer

VMT Video-Music Transformer (VMT) is an attention-based multi-modal model, which generates piano music for a given video. Paper https://arxiv.org/abs/

Chin-Tung Lin 5 Jul 13, 2022
This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

OpenVINO Inference API This is a repository for an object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operati

BMW TechOffice MUNICH 68 Nov 24, 2022
Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Multi-Anchor Active Domain Adaptation for Semantic Segmentation Munan Ning*, Donghuan Lu*, Dong Wei†, Cheng Bian, Chenglang Yuan, Shuang Yu, Kai Ma, Y

Munan Ning 36 Dec 07, 2022
LBBA-boosted WSOD

LBBA-boosted WSOD Summary Our code is based on ruotianluo/pytorch-faster-rcnn and WSCDN Sincerely thanks for your resources. Newer version of our code

Martin Dong 20 Sep 19, 2022
Reference implementation for Structured Prediction with Deep Value Networks

Deep Value Network (DVN) This code is a python reference implementation of DVNs introduced in Deep Value Networks Learn to Evaluate and Iteratively Re

Michael Gygli 55 Feb 02, 2022
Self Driving RC Car Code

Derp Learning Derp Learning is a Python package that collects data, trains models, and then controls an RC car for track racing. Hardware You will nee

Not Karol 39 Dec 07, 2022
An end-to-end image translation model with weight-map for color constancy

CCUnet An end-to-end image translation model with weight-map for color constancy 1. Download the dataset (take Colorchecker_recommended dataset as an

Jianhui Qiu 1 Dec 21, 2021
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models Code accompanying CVPR'20 paper of the same title. Paper lin

Alex Damian 7k Dec 30, 2022
Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization (CVPR 2021) This is the official implementation of PW

Intelligent Robotics and Machine Vision Lab 42 Dec 18, 2022
This repository contains a toolkit for collecting, labeling and tracking object keypoints

This repository contains a toolkit for collecting, labeling and tracking object keypoints. Object keypoints are semantic points in an object's coordinate frame.

ETHZ ASL 13 Dec 12, 2022
DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

DeepLab Introduction DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe. It combines densely-compute

Ali 234 Nov 14, 2022
Deep Learning to Improve Breast Cancer Detection on Screening Mammography

Shield: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Deep Learning to Improve Breast

Li Shen 305 Jan 03, 2023
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

Bae, Gwangbin 95 Jan 04, 2023
An 16kHz implementation of HiFi-GAN for soft-vc.

HiFi-GAN An 16kHz implementation of HiFi-GAN for soft-vc. Relevant links: Official HiFi-GAN repo HiFi-GAN paper Soft-VC repo Soft-VC paper Example Usa

Benjamin van Niekerk 42 Dec 27, 2022
for a paper about leveraging discourse markers for training new models

TSLM-DISCOURSE-MARKERS Scope This repository contains: (1) Code to extract discourse markers from wikipedia (TSA). (1) Code to extract significant dis

International Business Machines 6 Nov 02, 2022
WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

HeadPoseEstimation-WHENet-yolov4-onnx-openvino ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L 1. Usage $ git clone htt

Katsuya Hyodo 49 Sep 21, 2022