Text-to-Image Translation (DALL-E) for TPU in Pytorch

Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning

Requirements

pip install -r requirements.txt

Data Preparation

Place any image dataset with ImageNet-style directory structure (at least 1 subfolder) to fit the dataset into pytorch ImageFolder.

Training VQVAEs

You can easily test main.py with randomly generated fake data.

python train_vae.py --use_tpus --fake_data

For actual training provide specific directory for train_dir, val_dir, log_dir:

python train_vae.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results]

Training DALL-E

python train_dalle.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results] --vae_path [pretrained vae] --bpe_path [pretrained bpe(optional)]

TODO

Refactor Encoder and Decoder modules for better readability
Refactor VQVAE2
Add Net2Net Conditional Transformer for conditional image generation
Refactor, optimize, and merge DALL-E with Net2Net Conditional Transformer
Add Guided Diffusion + CLIP for image refinement
Add VAE converter for JAX to support dalle-mini
Add DALL-E colab notebook
Add RBGumbelQuantizer
Add HiT

ON-GOING

Test large dataset loading on TPU Pods
Change current DALL-E code to fully support latest updates from DALLE-pytorch

DONE

BibTeX

@misc{oord2018neural,
      title={Neural Discrete Representation Learning}, 
      author={Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu},
      year={2018},
      eprint={1711.00937},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{razavi2019generating,
      title={Generating Diverse High-Fidelity Images with VQ-VAE-2}, 
      author={Ali Razavi and Aaron van den Oord and Oriol Vinyals},
      year={2019},
      eprint={1906.00446},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation}, 
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Refactoring dalle-pytorch and taming-transformers for TPU VM

Related tags

Overview

Text-to-Image Translation (DALL-E) for TPU in Pytorch

Requirements

Data Preparation

Training VQVAEs

Training DALL-E

TODO

ON-GOING

DONE

BibTeX

Owner

Kim, Taehoon

Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

eXPeditious Data Transfer

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

Pytorch implementation for the Temporal and Object Quantification Networks (TOQ-Nets).

This repository implements Douzero's interface to IGCA.

LibFewShot: A Comprehensive Library for Few-shot Learning.

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

It is modified Tensorflow 2.x version of Mask R-CNN

Yggdrasil - A simplistic bot designed to streamline your server experience

QI-Q RoboMaster2022 CV Algorithm

Examples of how to create colorful, annotated equations in Latex using Tikz.

Code and data for the paper "Hearing What You Cannot See"

Informal Persian Universal Dependency Treebank

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

This respository includes implementations on Manifoldron: Direct Space Partition via Manifold Discovery

Revisiting Global Statistics Aggregation for Improving Image Restoration

PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".