High-Resolution Image Synthesis with Latent Diffusion Models

Overview

Latent Diffusion Models

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Model Zoo

Pretrained Autoencoding Models

rec2

Model FID vs val PSNR PSIM Link Comments
f=4, VQ (Z=8192, d=3) 0.58 27.43 +/- 4.26 0.53 +/- 0.21 https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
f=4, VQ (Z=8192, d=3) 1.06 25.21 +/- 4.17 0.72 +/- 0.26 https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 no attention
f=8, VQ (Z=16384, d=4) 1.14 23.07 +/- 3.99 1.17 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
f=8, VQ (Z=256, d=4) 1.49 22.35 +/- 3.81 1.26 +/- 0.37 https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
f=16, VQ (Z=16384, d=8) 5.15 20.83 +/- 3.61 1.73 +/- 0.43 https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1
f=4, KL 0.27 27.53 +/- 4.54 0.55 +/- 0.24 https://ommer-lab.com/files/latent-diffusion/kl-f4.zip
f=8, KL 0.90 24.19 +/- 4.19 1.02 +/- 0.35 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip
f=16, KL (d=16) 0.87 24.08 +/- 4.22 1.07 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
f=32, KL (d=64) 2.04 22.27 +/- 3.93 1.41 +/- 0.40 https://ommer-lab.com/files/latent-diffusion/kl-f32.zip

Get the models

Running the following script downloads und extracts all available pretrained autoencoding models.

bash scripts/download_first_stages.sh

The first stage models can then be found in models/first_stage_models/

Pretrained LDMs

Datset Task Model FID IS Prec Recall Link Comments
CelebA-HQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 5.11 (5.11) 3.29 0.72 0.49 https://ommer-lab.com/files/latent-diffusion/celeba.zip
FFHQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 4.98 (4.98) 4.50 (4.50) 0.73 0.50 https://ommer-lab.com/files/latent-diffusion/ffhq.zip
LSUN-Churches Unconditional Image Synthesis LDM-KL-8 (400 DDIM steps, eta=0) 4.02 (4.02) 2.72 0.64 0.52 https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip
LSUN-Bedrooms Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 2.95 (3.0) 2.22 (2.23) 0.66 0.48 https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip
ImageNet Class-conditional Image Synthesis LDM-VQ-8 (200 DDIM steps, eta=1) 7.77(7.76)* /15.82** 201.56(209.52)* /78.82** 0.84* / 0.65** 0.35* / 0.63** https://ommer-lab.com/files/latent-diffusion/cin.zip *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by ADM
Conceptual Captions Text-conditional Image Synthesis LDM-VQ-f4 (100 DDIM steps, eta=0) 16.79 13.89 N/A N/A https://ommer-lab.com/files/latent-diffusion/text2img.zip finetuned from LAION
OpenImages Super-resolution N/A N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation
OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
Landscapes (finetuned 512) Semantic Image Synthesis LDM-VQ-4 (100 DDIM steps, eta=1) N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip

Get the models

The LDMs listed above can jointly be downloaded and extracted via

bash scripts/download_models.sh

The models can then be found in models/ldm/ .

Sampling with unconditional models

We provide a first script for sampling from our unconditional models. Start it via

CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta> 

Coming Soon...

inpainting

Comments

Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
Vehicle detection using machine learning and computer vision techniques for Udacity's Self-Driving Car Engineer Nanodegree.

Vehicle Detection Video demo Overview Vehicle detection using these machine learning and computer vision techniques. Linear SVM HOG(Histogram of Orien

hata 1.1k Dec 18, 2022
Distributed Asynchronous Hyperparameter Optimization in Python

Hyperopt: Distributed Hyperparameter Optimization Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which

6.5k Jan 01, 2023
pip install python-office

🍬 python for office 👉 http://www.python4office.cn/ 👈 🌎 English Documentation 📚 简介 Python-office 是一个 Python 自动化办公第三方库,能解决大部分自动化办公的问题。而且每个功能只需一行代码,

程序员晚枫 272 Dec 29, 2022
A deep learning model for style-specific music generation.

DeepJ: A model for style-specific music generation https://arxiv.org/abs/1801.00887 Abstract Recent advances in deep neural networks have enabled algo

Henry Mao 704 Nov 23, 2022
This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Mask-RCNN on Mycobacterium tuberculosis This is an example of object detection on Mycobacterium Tuberculosis using Mask RCNN. Implement of Mask R-CNN

Jun-En Ding 1 Sep 16, 2021
This repository contains implementations and illustrative code to accompany DeepMind publications

DeepMind Research This repository contains implementations and illustrative code to accompany DeepMind publications. Along with publishing papers to a

DeepMind 11.3k Dec 31, 2022
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

Joseph P. Robinson 41 Dec 12, 2022
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation

Info This is the code repository of the work Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation from Elias T

2 Apr 20, 2022
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN Pytorch implementation Inception score evaluation StackGAN-v2-pytorch Tensorflow implementation for reproducing main results in the paper Sta

Han Zhang 1.8k Dec 21, 2022
A video scene detection algorithm is designed to detect a variety of different scenes within a video

Scene-Change-Detection - A video scene detection algorithm is designed to detect a variety of different scenes within a video. There is a very simple definition for a scene: It is a series of logical

1 Jan 04, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and

TuZheng 405 Jan 04, 2023
Constructing interpretable quadratic accuracy predictors to serve as an objective function for an IQCQP problem that represents NAS under latency constraints and solve it with efficient algorithms.

IQNAS: Interpretable Integer Quadratic programming Neural Architecture Search Realistic use of neural networks often requires adhering to multiple con

0 Oct 24, 2021
Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

DASR Paper Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution Jie Liang, Hui Zeng, and Lei Zhang. In arxiv preprint. Abs

81 Dec 28, 2022
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 09, 2022
Roger Labbe 13k Dec 29, 2022
Python code for loading the Aschaffenburg Pose Dataset.

Aschaffenburg Pose Dataset (APD) This repository contains Python code for loading and filtering the Aschaffenburg Pose Dataset. The dataset itself and

1 Nov 26, 2021
A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

ICT.MIRACLE lab 75 Dec 26, 2022
Prml - Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop

Pattern Recognition and Machine Learning (PRML) This project contains Jupyter notebooks of many the algorithms presented in Christopher Bishop's Patte

Gerardo Durán-Martín 1k Jan 07, 2023