A Human-in-the-Loop workflow for creating HD images from text

Overview

DALL·E Flow: A Human-in-the-Loop workflow for creating HD images from text
A Human-in-the-Loop? workflow for creating HD images from text

Open in Google Colab Open in Google Colab

DALL·E Flow is an interactive workflow for generating high-definition images from text prompt. First, it leverages DALL·E-Mega to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt. The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.

DALL·E Flow is built with Jina in a client-server architecture, which gives it high scalability, non-blocking streaming, and a modern Pythonic interface. Client can interact with the server via gRPC/Websocket/HTTP with TLS.

Why Human-in-the-Loop? Generative art is a creative process. While recent advances of DALL·E unleash people's creativity, having a single-prompt-single-output UX/UI locks the imagination to a single possibility, which is bad no matter how fine this single result is. DALL·E Flow is an alternative to the one-liner, by formalizing the generative art as an iterative procedure.

Gallery

Image filename is the corresponding text prompt.

A raccoon astronaut with the cosmos reflecting on the glass of his helmet dreaming of the stars, digital artoil painting of a hamster drinking tea outsideAn oil pastel painting of an annoyed cat in a spaceshipa rainy night with a superhero perched above a city, in the style of a comic bookA synthwave style sunset above the reflecting water of the sea, digital arta 3D render of a rainbow colored hot air balloon flying above a reflective lakea teddy bear on a skateboard in Times Square A stained glass window of toucans in outer spacea campfire in the woods at night with the milky-way galaxy in the skyThe Hanging Gardens of Babylon in the middle of a city, in the style of DalíAn oil painting of a family reunited inside of an airport, digital artan oil painting of a humanoid robot playing chess in the style of Matissegolden gucci airpods realistic photo

Client

Open in Google Colab

Using client is super easy. The following steps are best run in Jupyter notebook or Google Colab.

You will need to install DocArray and Jina first:

pip install "docarray[common]>=0.13.5" jina

We have provided a demo server for you to play:

⚠️ Due to the massive requests now, the server is super busy. You can deploy your own server by following the instruction here.

server_url = 'grpc://dalle-flow.jina.ai:51005'

Step 1: Generate via DALL·E Mega

Now let's define the prompt:

prompt = 'an oil painting of a humanoid robot playing chess in the style of Matisse'

Let's submit it to the server and visualize the results:

from docarray import Document

da = Document(text=prompt).post(server_url, parameters={'num_images': 16}).matches

da.plot_image_sprites(fig_size=(10,10), show_index=True)

Here we generate 16 candidates as defined in num_images, which takes about ~2 minutes. You can use a smaller value if it is too long for you. The results are sorted by CLIP-as-service, with index-0 as the best candidate judged by CLIP.

Step 2: Select and refinement via GLID3 XL

Of course, you may think differently. Notice the number in the top-left corner? Select the one you like the most and get a better view:

fav_id = 3
fav = da[fav_id]
fav.display()

Now let's submit the selected candidates to the server for diffusion.

diffused = fav.post(f'{server_url}/diffuse', parameters={'skip_rate': 0.5}).matches

diffused.plot_image_sprites(fig_size=(10,10), show_index=True)

This will give 36 images based on the given image. You may allow the model to improvise more by giving skip_rate a near-zero value, or a near-one value to force its closeness to the given image. The whole procedure takes about ~2 minutes.

Step 3: Select and upscale via SwanIR

Select the image you like the most, and give it a closer look:

dfav_id = 34
fav = diffused[dfav_id]
fav.display()

Finally, submit to the server for the last step: upscaling to 1024 x 1024px.

fav = fav.post(f'{server_url}/upscale')
fav.display()

That's it! It is the one. If not satisfied, please repeat the procedure.

Btw, DocArray is a powerful and easy-to-use data structure for unstructured data. It is super productive for data scientists who work in cross-/multi-modal domain. To learn more about DocArray, please check out the docs.

Server

You can host your own server by following the instruction below.

Hardware requirements

It is highly recommended to run DALL·E Flow on a GPU machine. In fact, one GPU is probably not enough. DALL·E Mega needs one with 22GB memory. SwinIR and GLID-3 also need one; as they can be spawned on-demandly in seconds, they can share one GPU.

It requires at least 40GB free space on the hard drive, mostly for downloading pretrained models.

CPU-only environment is not tested and likely won't work. Google Colab is likely throwing OOM hence also won't work.

Install

Clone repos

mkdir dalle && cd dalle
git clone https://github.com/jina-ai/dalle-flow.git
git clone https://github.com/JingyunLiang/SwinIR.git
git clone https://github.com/CompVis/latent-diffusion.git
git clone https://github.com/Jack000/glid-3-xl.git

You should have the following folder structure:

dalle/
 |
 |-- dalle-flow/
 |-- SwinIR/
 |-- glid-3-xl/
 |-- latent-diffusion/

Install auxiliary repos

cd latent-diffusion && pip install -e . && cd -
cd glid-3-xl && pip install -e . && cd -

There are couple models we need to download first for GLID-3-XL:

wget https://dall-3.com/models/glid-3-xl/bert.pt
wget https://dall-3.com/models/glid-3-xl/kl-f8.pt
wget https://dall-3.com/models/glid-3-xl/finetune.pt

Install flow

cd dalle-flow
pip install -r requirements.txt

Start the server

Now you are under dalle-flow/, run the following command:

jina flow --uses flow.yml

You should see this screen immediately:

On the first start it will take ~8 minutes for downloading the DALL·E mega model and other necessary models. The proceeding runs should only take ~1 minute to reach the success message.

When everything is ready, you will see:

Congrats! Now you should be able to run the client.

You can modify and extend the server flow as you like, e.g. changing the model, adding persistence, or even auto-posting to Instagram/OpenSea. With Jina and DocArray, you can easily make DALL·E Flow cloud-native and ready for production.

Support

Join Us

DALL·E Flow is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.

Owner
Jina AI
A Neural Search Company. We help businesses and developers to build neural search-powered applications in minutes.
Jina AI
Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows This is the official implementation of the ICCV 2021 Paper "Probabilistic Mono

62 Nov 23, 2022
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Back to the Feature with PixLoc We introduce PixLoc, a neural network for end-to-end learning of camera localization from an image and a 3D model via

Computer Vision and Geometry Lab 610 Jan 05, 2023
A Python Library for Graph Outlier Detection (Anomaly Detection)

PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detect

PyGOD Team 757 Jan 04, 2023
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Potter Hsu 182 Jan 03, 2023
a Lightweight library for sequential learning agents, including reinforcement learning

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning) TL;DR salina is a lightweight library

Facebook Research 405 Dec 17, 2022
Relative Uncertainty Learning for Facial Expression Recognition

Relative Uncertainty Learning for Facial Expression Recognition The official implementation of the following paper at NeurIPS2021: Title: Relative Unc

35 Dec 28, 2022
A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

11 Oct 08, 2022
🛠️ Tools for Transformers compression using Lightning ⚡

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

Jules Belveze 66 Dec 11, 2022
Wenzhou-Kean University AI-LAB

AI-LAB This is Wenzhou-Kean University AI-LAB. Our research interests are in Computer Vision and Natural Language Processing. Computer Vision Please g

WKU AI-LAB 10 May 05, 2022
This repo is about to create the Streamlit application for given ML model.

HR-Attritiion-using-Streamlit This repo is about to create the Streamlit application for given ML model. Problem Statement: Managing peoples at workpl

Pavan Giri 0 Dec 10, 2021
PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

Facebook Research 887 Jan 08, 2023
Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ)

Real2CAD-3DV Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ) Group Member: Yue Pan, Yuanwen Yue, Bingxin Ke, Yujie He

24 Jun 22, 2022
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022
A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines

A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines Understanding the results of deep neural networks is

Johan van den Heuvel 2 Dec 13, 2021
Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis.

ID-Unet: Iterative-view-synthesis(CVPR2021 Oral) Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis. Overvie

17 Aug 23, 2022
Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CCT)

Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CCT) Paper, Project Page This repo contains the official implementation of CVPR

Yassine 344 Dec 29, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Minimal Hand A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run. This project provides the

Yuxiao Zhou 824 Jan 07, 2023
Create and implement a deep learning library from scratch.

In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The Proj

Rishabh Bali 22 Aug 23, 2022
Orbivator AI - To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

Orbivator_AI Breast Cancer Wisconsin (Diagnostic) GOAL To Determine which features of data (measurements) are most important for diagnosing breast can

anurag kumar singh 1 Jan 02, 2022