Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Last update: Jan 09, 2023

Overview

Deep Daze

mist over green hills

shattered plates on the grass

cosmic love and attention

a time traveler in the crowd

life during the plague

meditative peace in a sunlit forest

a man painting a completely red image

a psychedelic experience on LSD

What is this?

Simple command line tool for text to image generation using OpenAI's CLIP and Siren. Credit goes to Ryan Murdock for the discovery of this technique (and for coming up with the great name)!

Original notebook

New simplified notebook

This will require that you have an Nvidia GPU

Install

$ pip install deep-daze

Examples

$ imagine "a house in the forest"

That's it.

If you have enough memory, you can get better quality by adding a --deeper flag

$ imagine "shattered plates on the ground" --deeper

Advanced

In true deep learning fashion, more layers will yield better results. Default is at 16, but can be increased to 32 depending on your resources.

$ imagine "stranger in strange lands" --num-layers 32

Usage

CLI

NAME
    imagine

SYNOPSIS
    imagine TEXT <flags>

POSITIONAL ARGUMENTS
    TEXT
        (required) A phrase less than 77 characters which you would like to visualize.

FLAGS
    --img=IMAGE_PATH
        Default: None
        Path to png/jpg image or PIL image to optimize on
    --encoding=ENCODING
        Default: None
        User-created custom CLIP encoding. If used, replaces any text or image that was used.
    --create_story=CREATE_STORY
        Default: False
        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 chars can be used. Requires save_progress to visualize the transitions of the story.
    --story_start_words=STORY_START_WORDS
        Default: 5
        Only used if create_story is True. How many words to optimize on for the first epoch.
    --story_words_per_epoch=STORY_WORDS_PER_EPOCH
        Default: 5
        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.
    --lower_bound_cutout=LOWER_BOUND_CUTOUT
        Default: 0.1
        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.
    --upper_bound_cutout=UPPER_BOUND_CUTOUT
        Default: 1.0
        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.
    --saturate_bound=SATURATE_BOUND
        Default: False
        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.
    --learning_rate=LEARNING_RATE
        Default: 1e-05
        The learning rate of the neural net.
    --num_layers=NUM_LAYERS
        Default: 16
        The number of hidden layers to use in the Siren neural net.
    --batch_size=BATCH_SIZE
        Default: 4
        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.
    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY
        Default: 4
        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.
    --epochs=EPOCHS
        Default: 20
        The number of epochs to run.
    --iterations=ITERATIONS
        Default: 1050
        The number of times to calculate and backpropagate loss in a given epoch.
    --save_every=SAVE_EVERY
        Default: 100
        Generate an image every time iterations is a multiple of this number.
    --image_width=IMAGE_WIDTH
        Default: 512
        The desired resolution of the image.
    --deeper=DEEPER
        Default: False
        Uses a Siren neural net with 32 hidden layers.
    --overwrite=OVERWRITE
        Default: False
        Whether or not to overwrite existing generated images of the same name.
    --save_progress=SAVE_PROGRESS
        Default: False
        Whether or not to save images generated before training Siren is complete.
    --seed=SEED
        Type: Optional[]
        Default: None
        A seed to be used for deterministic runs.
    --open_folder=OPEN_FOLDER
        Default: True
        Whether or not to open a folder showing your generated images.
    --save_date_time=SAVE_DATE_TIME
        Default: False
        Save files with a timestamp prepended e.g. `%y%m%d-%H%M%S-my_phrase_here`
    --start_image_path=START_IMAGE_PATH
        Default: None
        The generator is trained first on a starting image before steered towards the textual input
    --start_image_train_iters=START_IMAGE_TRAIN_ITERS
        Default: 50
        The number of steps for the initial training on the starting image
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.

Priming

Technique first devised and shared by Mario Klingemann, it allows you to prime the generator network with a starting image, before being steered towards the text.

Simply specify the path to the image you wish to use, and optionally the number of initial training steps.

$ imagine 'a clear night sky filled with stars' --start-image-path ./cloudy-night-sky.jpg

Primed starting image

Then trained with the prompt A pizza with green pepper.

Optimize for the interpretation of an image

We can also feed in an image as an optimization goal, instead of only priming the generator network. Deepdaze will then render its own interpretation of that image:

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

Original image:

The network's interpretation:

Original image:

The network's interpretation:

Optimize for text and image combined

$ imagine "A psychedelic experience." --img samples/hot-dog.jpg

The network's interpretation:

New: Create a story

The regular mode for texts only allows 77 characters. If you want to visualize a full story/paragraph/song/poem, set create_story to True.

Given the poem “Stopping by Woods On a Snowy Evening” by Robert Frost - "Whose woods these are I think I know. His house is in the village though; He will not see me stopping here To watch his woods fill up with snow. My little horse must think it queer To stop without a farmhouse near Between the woods and frozen lake The darkest evening of the year. He gives his harness bells a shake To ask if there is some mistake. The only other sound’s the sweep Of easy wind and downy flake. The woods are lovely, dark and deep, But I have promises to keep, And miles to go before I sleep, And miles to go before I sleep.".

We get:

https://user-images.githubusercontent.com/19983153/109539633-d671ef80-7ac1-11eb-8d8c-380332d7c868.mp4

Python

Invoke `deep_daze.Imagine` in Python

from deep_daze import Imagine

imagine = Imagine(
    text = 'cosmic love and attention',
    num_layers = 24,
)
imagine()

Save progress every fourth iteration

Save images in the format insert_text_here.00001.png, insert_text_here.00002.png, ...up to (total_iterations % save_every)

imagine = Imagine(
    text=text,
    save_every=4,
    save_progress=True
)

Prepend current timestamp on each image.

Creates files with both the timestamp and the sequence number.

e.g. 210129-043928_328751_insert_text_here.00001.png, 210129-043928_512351_insert_text_here.00002.png, ...

imagine = Imagine(
    text=text,
    save_every=4,
    save_progress=True,
    save_date_time=True,
)

High GPU memory usage

If you have at least 16 GiB of vram available, you should be able to run these settings with some wiggle room.

imagine = Imagine(
    text=text,
    num_layers=42,
    batch_size=64,
    gradient_accumulate_every=1,
)

Average GPU memory usage

imagine = Imagine(
    text=text,
    num_layers=24,
    batch_size=16,
    gradient_accumulate_every=2
)

Very low GPU memory usage (less than 4 GiB)

If you are desperate to run this on a card with less than 8 GiB vram, you can lower the image_width.

imagine = Imagine(
    text=text,
    image_width=256,
    num_layers=16,
    batch_size=1,
    gradient_accumulate_every=16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

VRAM and speed benchmarks:

These experiments were conducted with a 2060 Super RTX and a 3700X Ryzen 5. We first mention the parameters (bs = batch size), then the memory usage and in some cases the training iterations per second:

For an image resolution of 512:

bs 1, num_layers 22: 7.96 GB
bs 2, num_layers 20: 7.5 GB
bs 16, num_layers 16: 6.5 GB

For an image resolution of 256:

bs 8, num_layers 48: 5.3 GB
bs 16, num_layers 48: 5.46 GB - 2.0 it/s
bs 32, num_layers 48: 5.92 GB - 1.67 it/s
bs 8, num_layers 44: 5 GB - 2.39 it/s
bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s
bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s
bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s

@NotNANtoN recommends a batch size of 32 with 44 layers and training 1-8 epochs.

Where is this going?

This is just a teaser. We will be able to generate images, sound, anything at will, with natural language. The holodeck is about to become real in our lifetimes.

Please join replication efforts for DALL-E for Pytorch or Mesh Tensorflow if you are interested in furthering this technology.

Alternatives

Big Sleep - CLIP and the generator from Big GAN

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{sitzmann2020implicit,
    title   = {Implicit Neural Representations with Periodic Activation Functions},
    author  = {Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein},
    year    = {2020},
    eprint  = {2006.09661},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Comments

Control the starting seed?

I wanted to make a PR, but I couldn't figure out where would be the best place to intervene in the cascade of initialization calls upto Siren itself.

Compared to the original notebook, which randomizes its initial seed on each run, the deep-daze approach currently always seems to start with a very noir seed:

and amazingly CLIP is quite happy to navigate inside it and keep generating in that style (even when I would prefer a cheery light generation instead). Here are two examples of the types of generation one ends up with:

Ideally one would like to specify some hex color as "dominant" for the seed and expose that as a command line option, but I could do that myself if I found out where to look. Could you ( @lucidrains ) point me to the right general area where one would set a custom init seed for Siren?

opened by dginev 30
Some new augmentations
I included the augmentations mentioned in #66. mainly avg_feats (leads to more concrete scenes) and center_bias (leads to the object in question - if there is an object talked about in the sentence - to be centered in the middle of the image) are interesting

I fixed the shape problem of the start_image/image priming in #100. For my one test image it just turns into a completely white image, not sure why

I added the option to choose between Adam, AdamP and DiffGrad. There are minor differences between them, but in general AdamP is fine by default

I added the option to choose a ResNet preceptor (to avoid having the preceptor in the parameters of the Imagine and DeepDaze instances I put it into a list - a bit ugly but it works.
opened by NotNANtoN 23
Much better quality and easier size schedule
Main changes

I cleaned up the cryptic size scheduling that was used for the sampling of random cut-out sizes. Before there was a weird scheme that adapted neither to the batch size nor the total number of episodes. I inspected it in detail and found that the sampling scheme was sampling in ranges of 0.1 for intervals starting at 0.49 to 1.09 (depending on the schedule). A comment in the code says that the context should increase as the model saturates - which means the sampling should be closer to 1.

The new approach is simple: the random sizes are uniformly samples between a lower bound (default=0.1) and an upper bound (default=1.0). Both are customizable by the user in the Imagine class. I emulated some scheduling by adding the sature_bound parameter. If set to True, it linearly increases the lower bound from the starting value to a limit during the training. I set the limit to 0.8 because from 0.8 and above the generations become was

hed out and unstable. I also noticed that this scheduling does not really bring about any benefits, but I have not experimented extensively with it.

Minor changes

I cleaned up some transformations/normalizations here and there to make it more uniform.

I added some more explanations in the README for the parameters, but have NOT yet included it in the CLI code.

I also changed that the images are saved in high-quality JPGs instead of PNGs to make the saving faster and the image file sizes smaller - quality differences can not be noticed with the (my) naked eye

Results

Examples from old README

Anyways, the performance is (from my visual inspection) MUCH better now. I recreated the examples that are currently in the README (num_layers=44, batch_size=32, gradient_accumulate_every=1, 5 epochs - needs less than 8GB of RAM, and about 20 mins):

https://user-images.githubusercontent.com/19983153/109133200-0bdeac00-7755-11eb-8c87-bd18ab38bad6.mp4

https://user-images.githubusercontent.com/19983153/109133230-1305ba00-7755-11eb-840e-d424bb6cbd75.mp4

https://user-images.githubusercontent.com/19983153/109133247-17ca6e00-7755-11eb-944d-775b46da1d61.mp4

https://user-images.githubusercontent.com/19983153/109133352-3892c380-7755-11eb-9ac3-4ce9cf031c27.mp4

A very fancy one is "A psychedelic experience on LSD":

@lucidrains Feel free to replace the images by the new ones. I can also do it, if you consent.

Generations from img and img+text

Some more hot-dog images to show that this still works: Generations using "A dog in a hotdog costume":

Now given this starting image:

We can generate:

Add "A psychedelic experience" as text to img: Adding the text "A dog in a hotdog costume" to the image does not work too nicely:

Story creation

Lastly, I can show the story creation feature of the last PR (although with few generations per epoch, so the dream kind of happens too quickly):

"I dreamed that I was with my coworkers having a splendid party in someone's house. Even though I had many people surrounding me, I felt so lonely and I just wanted to cry. I went to the bathroom and something hit me, and I woke up."

https://user-images.githubusercontent.com/19983153/109135224-1f8b1200-7757-11eb-9ba7-ae7540cd0401.mp4

"I dreamt the house across the street from me was on fire. The people who live there were not there. It was a friend of my family and her daughter. I was looking out the window and saw all the smoke so I called 911 but it was busy."

https://user-images.githubusercontent.com/19983153/109135243-26b22000-7757-11eb-954d-6c0d54e8c34d.mp4
opened by NotNANtoN 19
AssertionError: CUDA must be available in order to use Deep Daze

File "c:\python39\lib\runpy.py", line 197, in _run_module_as_main return run_code(code, main_globals, None, File "c:\python39\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Python39\Scripts\imagine.exe_main.py", line 4, in File "c:\python39\lib\site-packages\deep_daze_init.py", line 1, in from deep_daze.deep_daze import DeepDaze, Imagine File "c:\python39\lib\site-packages\deep_daze\deep_daze.py", line 25, in assert torch.cuda.is_available(), 'CUDA must be available in order to use Deep Daze' AssertionError: CUDA must be available in order to use Deep Daze

I have installed the CUDA ToolKit, what do i need to do to fix this?

opened by itsHNTR 12

Memory error when generating image

I encounter this error upon running:

Traceback (most recent call last):
  File "c:\users\miner\appdata\local\programs\python\python38\lib\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\miner\appdata\local\programs\python\python38\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Miner\AppData\Local\Programs\Python\Python38\Scripts\imagine.exe\__main__.py", line 7, in <module>
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\cli.py", line 111, in main
    fire.Fire(train)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\cli.py", line 107, in train
    imagine()
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 447, in forward
    _, loss = self.train_step(epoch, i)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 380, in train_step
    out, loss = self.model(self.clip_encoding)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 168, in forward
    out = self.model()
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 97, in forward
    out = self.net(coords)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 76, in forward
    x = self.net(x)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 48, in forward
    out = self.activation(out)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 19, in forward
    return torch.sin(self.w0 * x)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.85 GiB already allocated; 79.44 MiB free; 3.87 GiB reserved in total by PyTorch)

I attempted clearing cuda cache, but the same error occured.

>>> import torch
>>> torch.cuda.empty_cache()

opened by amcwb 11

"RuntimeError: Method 'forward' is not defined."

I've tried to run the imagine command, but this is what I get every time I run the command.

(venv) C:\WINDOWS\system32>imagine "alone in the dark" Traceback (most recent call last): File "c:\program files\python38\lib\runpy.py", line 192, in _run_module_as_main return run_code(code, main_globals, None, File "c:\program files\python38\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Program Files\Python38\Scripts\imagine.exe_main.py", line 4, in File "c:\program files\python38\lib\site-packages\deep_daze_init.py", line 1, in from deep_daze.deep_daze import DeepDaze, Imagine File "c:\program files\python38\lib\site-packages\deep_daze\deep_daze.py", line 39, in perceptor, normalize_image = load() File "c:\program files\python38\lib\site-packages\deep_daze\clip.py", line 192, in load model.apply(patch_device) File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 473, in apply module.apply(fn) File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 473, in apply module.apply(fn) File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 473, in apply module.apply(fn) [Previous line repeated 3 more times] File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 474, in apply fn(self) File "c:\program files\python38\lib\site-packages\deep_daze\clip.py", line 183, in patch_device graphs = [module.graph] if hasattr(module, "graph") else [] File "c:\program files\python38\lib\site-packages\torch\jit_script.py", line 449, in graph return self._c._get_method("forward").graph RuntimeError: Method 'forward' is not defined.

I'm new to all of this so it's kind of confusing. Is there any fix for this RuntimeError: Method 'forward' is not defined. ?

opened by NuclearSurvivor 10

Method 'forward' is not defined

I installed the module via

$ pip install deep-daze

and just tried the provided example with

$ imagine "a house in the forest"

but after it loaded something for a few minutes (the first time I run the command) it throws this error

Traceback (most recent call last):
  File "/home/luca/anaconda3/bin/imagine", line 5, in <module>
    from deep_daze.cli import main
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/__init__.py", line 1, in <module>
    from deep_daze.deep_daze import DeepDaze, Imagine
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 39, in <module>
    perceptor, normalize_image = load()
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/clip.py", line 192, in load
    model.apply(patch_device)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 473, in apply
    module.apply(fn)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 473, in apply
    module.apply(fn)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 473, in apply
    module.apply(fn)
  [Previous line repeated 3 more times]
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 474, in apply
    fn(self)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/clip.py", line 183, in patch_device
    graphs = [module.graph] if hasattr(module, "graph") else []
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/jit/_script.py", line 449, in graph
    return self._c._get_method("forward").graph
RuntimeError: Method 'forward' is not defined.

My system is:

Ubuntu 18.04.4 LTS GeForce RTX 2070 pytorch 1.7.1 python version 3.7.1

opened by Binbose 10

(Suggestion) Include more useful parameters as form inputs in colab

I've included these form inputs in my personal copy of your Colab notebook. In particular, reducing the image_width parameter allows for one to vastly increase the number of hidden_layers. By going to an image_width of 256 (instead of the default 512) I was able to run 32 hidden layers without problems on a T4.

from tqdm import trange
from IPython.display import Image, display

from deep_daze import Imagine

TEXT = 'blue marshmallow' #@param {type:"string"}
NUM_LAYERS = 16 #@param {type:"number"}
SAVE_EVERY =  20#@param {type:"number"}
IMAGE_WIDTH = 512 #@param {type:"number"}
SAVE_PROGRESS = False #@param {type:"boolean"}
LEARNING_RATE = 1e-5 #@param {type:"number"}
ITERATIONS = 1050 #@param {type:"number"}

model = Imagine(
    text = TEXT,
    num_layers = NUM_LAYERS,
    save_every = SAVE_EVERY,
    image_width = IMAGE_WIDTH,
    lr = LEARNING_RATE,
    iterations = ITERATIONS,
    save_progress = SAVE_PROGRESS
)

Feel free to include them in your copy if you'd like to.

opened by afiaka87 10

Is this a bug? (Edit: Replace start_image with NotNANtoN's `img` clip embed?)

https://github.com/lucidrains/deep-daze/blob/964004154957dbb2f4ca231b03a057dc7baf16f2/deep_daze/deep_daze.py#L321

Saw this new functionality added. Super useful. Just making sure this function works correctly. It looks like it's called during init, but because it returns in its nested ifs, it only ever runs the code for the img_embed if you didn't specify a clip_encode (I think).

opened by afiaka87 9
[Suggestion] Begin with encoded image / implicit neural representation of user image

From what I can tell, SIREN should be very capable of encoding a supplied bitmap image to an implicit neural representation. I haven't figured out how to do it myself yet, but the ability to begin a session of deep-dazing with a specific image, to some level of completion with encoding to INR), should be very helpful with guiding the image generation or perhaps even image modification. Or old Deep Dream style hallucinations.

[Rambling] One of the first things I tried to do with the original notebook was make an emote. Well, it didn't work. It made a hazy half-remembered dream image of a screen with non-descript emotes on it. Then I realized if I stopped the training, and didn't generate a network, I could swap out the CLIP prompt and steer the ship so to speak. From there it was trying to get it to generate a yellow circle, orb, or ball, and that wasn't happening.

But what if it could begin with an image of a yellow circle? Or a yellow circle with eyes and a mouth? Would it manage to make an emote out of it when prompted "visceral nightmare emoji"? Or would it cover it the yellow circle with strange shapes that have little to do with the supplied image or structure? I don't actually know. But at the very least it may end up with an aesthetic like the old Deep Dream putting eyes and spider legs on everything.

Or perhaps something to force the generation to follow certain shapes by warping the initial -1 to 1 2D grid / mgrid that was in the old notebook.

opened by torridgristle 8
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/deep_daze/data/bpe_simple_vocab_16e6.txt'

I'm trying this out in colab and facing the above error.Here's the full stack: Traceback (most recent call last): File "/usr/local/bin/imagine", line 5, in from deep_daze.cli import main File "/usr/local/lib/python3.6/dist-packages/deep_daze/init.py", line 1, in from deep_daze.deep_daze import DeepDaze, Imagine File "/usr/local/lib/python3.6/dist-packages/deep_daze/deep_daze.py", line 11, in from deep_daze.clip import load, tokenize, normalize_image File "/usr/local/lib/python3.6/dist-packages/deep_daze/clip.py", line 223, in _tokenizer = SimpleTokenizer() File "/usr/local/lib/python3.6/dist-packages/deep_daze/clip.py", line 64, in init merges = Path(bpe_path).read_text().split('\n') File "/usr/lib/python3.6/pathlib.py", line 1196, in read_text with self.open(mode='r', encoding=encoding, errors=errors) as f: File "/usr/lib/python3.6/pathlib.py", line 1183, in open opener=self._opener) File "/usr/lib/python3.6/pathlib.py", line 1037, in _opener return self._accessor.open(self, flags, mode) File "/usr/lib/python3.6/pathlib.py", line 387, in wrapped return strfunc(str(pathobj), *args) FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/deep_daze/data/bpe_simple_vocab_16e6.txt'

opened by Vaibhav21pandit 8
README.md incorrectly states that deep-daze works with AMD GPUs

"This will require that you have an Nvidia GPU or AMD GPU"

Correct me if I am wrong, but I think it's still the case that an Nvidia GPU is required, otherwise Deep-Daze uses the CPU.

opened by mh0w 1
CUDA capability sm_86 is not compatible with the current PyTorch installation
Hello! I am using a RTX 3080 Ti and I can't figure out which PyTorch and which CUDA versions to use in order to get it working.

The current CUDA version is 11.7.

The current PyTorch version is 1.12.0+cu102

The full error message is:

Setting jit to False because torch version is not 1.7.1. /home/user/.local/lib/python3.8/site-packages/torch/cuda/init.py:146: UserWarning: NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) Traceback (most recent call last): File "/home/user/.local/bin/imagine", line 8, in sys.exit(main()) File "/home/user/.local/lib/python3.8/site-packages/deep_daze/cli.py", line 151, in main fire.Fire(train) File "/home/user/.local/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/user/.local/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/user/.local/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/user/.local/lib/python3.8/site-packages/deep_daze/cli.py", line 99, in train imagine = Imagine( File "/home/user/.local/lib/python3.8/site-packages/deep_daze/deep_daze.py", line 396, in init self.clip_encoding = self.create_clip_encoding(text=text, img=img, encoding=clip_encoding) File "/home/user/.local/lib/python3.8/site-packages/deep_daze/deep_daze.py", line 424, in create_clip_encoding encoding = self.create_text_encoding(text) File "/home/user/.local/lib/python3.8/site-packages/deep_daze/deep_daze.py", line 432, in create_text_encoding text_encoding = self.perceptor.encode_text(tokenized_text).detach() File "/home/user/.local/lib/python3.8/site-packages/deep_daze/clip.py", line 525, in encode_text x = self.token_embedding(text).type(self.dtype) # [batch_size, n_ctx, d_model] File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/home/user/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
opened by marceljhuber 1
TypeError: '<=' not supported between instances of 'str' and 'float'

C:\Users\User>imagine a jasmine flower Setting jit to False because torch version is not 1.7.1. Traceback (most recent call last): File "c:\users\ung\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\ung\appdata\local\programs\python\python39\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\UNG\AppData\Local\Programs\Python\Python39\Scripts\imagine.exe_main.py", line 7, in File "c:\users\ung\appdata\local\programs\python\python39\lib\site-packages\deep_daze\cli.py", line 151, in main fire.Fire(train) File "c:\users\ung\appdata\local\programs\python\python39\lib\site-packages\fire\core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "c:\users\ung\appdata\local\programs\python\python39\lib\site-packages\fire\core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "c:\users\ung\appdata\local\programs\python\python39\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "c:\users\ung\appdata\local\programs\python\python39\lib\site-packages\deep_daze\cli.py", line 99, in train imagine = Imagine( File "c:\users\ung\appdata\local\programs\python\python39\lib\site-packages\deep_daze\deep_daze.py", line 380, in init self.optimizer = AdamP(siren_params, lr) File "c:\users\ung\appdata\local\programs\python\python39\lib\site-packages\torch_optimizer\adamp.py", line 57, in init if lr <= 0.0: TypeError: '<=' not supported between instances of 'str' and 'float'

opened by LJBoxx 1
CUDA not available while using deep-daze (i'm using a 3070 so it should have cuda cores)

i got this error when using it as a normal user and an admin [image](https://user-images.githubusercontent.com/82838374/163880302-d7644373-1fc2-4860-90f6-85c6b3df29aa.png

how can i fix this (if possible)

opened by uPos3odon08 1
Updated Simplified Notebook.

colab-link updated simplified notebook with image interpolation (by simply using a bigger grid), saving/ loading pre-trained siren net and retraining siren net on different prompt. and added super resolution from https://github.com/krasserm .

opened by Vbansal21 0

Releases(0.11.1)

0.11.1(Mar 13, 2022)

Source code(tar.gz)
Source code(zip)
0.11.0(Jan 26, 2022)

Source code(tar.gz)
Source code(zip)
0.10.3(Oct 19, 2021)

Source code(tar.gz)
Source code(zip)
0.10.2(Apr 8, 2021)

Source code(tar.gz)
Source code(zip)
0.10.1(Apr 7, 2021)

Source code(tar.gz)
Source code(zip)
0.10.0(Apr 5, 2021)

Source code(tar.gz)
Source code(zip)
0.9.0(Apr 2, 2021)

Source code(tar.gz)
Source code(zip)
0.8.1(Mar 29, 2021)

Source code(tar.gz)
Source code(zip)
0.8.0(Mar 28, 2021)

Source code(tar.gz)
Source code(zip)
0.7.2(Mar 13, 2021)

Source code(tar.gz)
Source code(zip)
0.7.1(Mar 13, 2021)

Source code(tar.gz)
Source code(zip)
0.7.0(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)
0.6.3(Feb 27, 2021)

Source code(tar.gz)
Source code(zip)
0.6.2(Feb 22, 2021)

Source code(tar.gz)
Source code(zip)
0.6.1(Feb 17, 2021)

Source code(tar.gz)
Source code(zip)
0.6.0(Feb 16, 2021)

Source code(tar.gz)
Source code(zip)
0.5.0(Feb 14, 2021)

Source code(tar.gz)
Source code(zip)
0.4.9(Feb 6, 2021)

Source code(tar.gz)
Source code(zip)
0.4.8(Feb 6, 2021)

Source code(tar.gz)
Source code(zip)
0.4.7(Feb 5, 2021)

Source code(tar.gz)
Source code(zip)
0.4.6(Feb 5, 2021)

Source code(tar.gz)
Source code(zip)
0.4.5(Feb 3, 2021)

Source code(tar.gz)
Source code(zip)
0.4.4(Feb 3, 2021)

Source code(tar.gz)
Source code(zip)
0.4.3(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.4.2(Feb 1, 2021)

Source code(tar.gz)
Source code(zip)
0.4.1(Feb 1, 2021)

Source code(tar.gz)
Source code(zip)
0.4.0(Jan 31, 2021)

Source code(tar.gz)
Source code(zip)
0.3.6(Jan 31, 2021)

Source code(tar.gz)
Source code(zip)
0.3.5(Jan 30, 2021)

Source code(tar.gz)
Source code(zip)
0.3.4(Jan 29, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need.

GitHub Repository

The most comprehensive, exhaustive, parameterized command-line wordle solver.

Wordle Solver The most comprehensive, exhaustive, parameterized command-line wordle solver. Wordle is a real

27 Nov 21, 2022

git-partial-submodule is a command-line script for setting up and working with submodules while enabling them to use git's partial clone and sparse checkout features.

Partial Submodules for Git git-partial-submodule is a command-line script for setting up and working with submodules while enabling them to use git's

15 Sep 22, 2022

MiShell is a multi-platform, multi-architecture project based on the first version (MiShell32)

MiShell is a multi-platform, multi-architecture project based on the first version (MiShell32), which offers super super small reverse shell payloads great for injection in buffer overflow vulnerabil

0 Oct 27, 2022

Very nice SMS & Mail Bomber for Termux and Linux.

Very nice SMS & Mail Bomber for Termux and Linux. Coded with love)))

5 Nov 06, 2022

Command-line tool to use LNURL with your LND instance

Sprint planner Sprint planner is a Python script for planning your Jira tasks based on your calendar availability. Installation Use the package manage

6 Jan 14, 2022

A simple command line tool for changing the icons of folders or files on MacOS.

Mac OS File Icon Changer Description A small and simple script to quickly change large amounts or a few files and folders icons to easily customize th

3 Jan 02, 2023

grungegirl is the hacker's drug encyclopedia. programmed in python for maximum modularity and ease of configuration.

grungegirl. cli-based drug search for girls. welcome. grungegirl is aiming to be the premier drug culture application. it is the hacker's encyclopedia

10 Oct 02, 2022

CryptoCo-py is a Python CLI application that uses CoinGecko API to allow the user to query cryptocurrency information by typing simple commands.

CryptoCo-py is a Python CLI application that uses CoinGecko API to allow the user to query cryptocurrency information by typing simple com

1 Jan 10, 2022

Python3 library for multimedia functions at the command terminal

TERMINEDIA This is a Python library allowing using a text-terminal as a low-resolution graphics output, along with keyboard realtime reading, and a co

89 Dec 17, 2022

Pyreadline3 - Windows implementation of the GNU readline library

pyreadline3 The pyreadline3 package is based on the stale package pyreadline loc

32 Jan 06, 2023

Command-line parsing library for Python 3.

36 Dec 15, 2022

AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using Mozilla DeepSpeech.

AutoSub About Motivation Installation Docker How-to example How it works TO-DO Contributing References About AutoSub is a CLI application to generate

414 Jan 06, 2023

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Related tags

Overview

Deep Daze

What is this?

Install

Examples

Advanced

Usage

CLI

Priming

Optimize for the interpretation of an image

Optimize for text and image combined

New: Create a story

Python

Invoke deep_daze.Imagine in Python

Save progress every fourth iteration

Prepend current timestamp on each image.

High GPU memory usage

Average GPU memory usage

Very low GPU memory usage (less than 4 GiB)

VRAM and speed benchmarks:

Where is this going?

Alternatives

Citations

Comments

Main changes

Minor changes

Results

Examples from old README

Generations from img and img+text

Story creation

Releases(0.11.1)

0.11.1(Mar 13, 2022)

0.11.0(Jan 26, 2022)

0.10.3(Oct 19, 2021)

0.10.2(Apr 8, 2021)

0.10.1(Apr 7, 2021)

0.10.0(Apr 5, 2021)

0.9.0(Apr 2, 2021)

0.8.1(Mar 29, 2021)

0.8.0(Mar 28, 2021)

0.7.2(Mar 13, 2021)

0.7.1(Mar 13, 2021)

0.7.0(Mar 2, 2021)

0.6.3(Feb 27, 2021)

0.6.2(Feb 22, 2021)

0.6.1(Feb 17, 2021)

0.6.0(Feb 16, 2021)

0.5.0(Feb 14, 2021)

0.4.9(Feb 6, 2021)

0.4.8(Feb 6, 2021)

0.4.7(Feb 5, 2021)

0.4.6(Feb 5, 2021)

0.4.5(Feb 3, 2021)

0.4.4(Feb 3, 2021)

0.4.3(Feb 2, 2021)

0.4.2(Feb 1, 2021)

0.4.1(Feb 1, 2021)

0.4.0(Jan 31, 2021)

0.3.6(Jan 31, 2021)

0.3.5(Jan 30, 2021)

0.3.4(Jan 29, 2021)

Owner

Phil Wang

The most comprehensive, exhaustive, parameterized command-line wordle solver.

git-partial-submodule is a command-line script for setting up and working with submodules while enabling them to use git's partial clone and sparse checkout features.

MiShell is a multi-platform, multi-architecture project based on the first version (MiShell32)

Very nice SMS & Mail Bomber for Termux and Linux.

Command-line tool to use LNURL with your LND instance

A simple command line tool for changing the icons of folders or files on MacOS.

grungegirl is the hacker's drug encyclopedia. programmed in python for maximum modularity and ease of configuration.

CryptoCo-py is a Python CLI application that uses CoinGecko API to allow the user to query cryptocurrency information by typing simple commands.

Python3 library for multimedia functions at the command terminal

Pyreadline3 - Windows implementation of the GNU readline library

Command-line parsing library for Python 3.

AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using Mozilla DeepSpeech.

CLI tool to computes CO2 emissions of HPC computations following green-algorithms.org methodology

A CLI messenger for the Signum community.

A simple note taker CLI program written in python

Invoke `deep_daze.Imagine` in Python