Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)

Overview

Maximum Likelihood Training of Score-Based Diffusion Models

This repo contains the official implementation for the paper Maximum Likelihood Training of Score-Based Diffusion Models

by Yang Song*, Conor Durkan*, Iain Murray, and Stefano Ermon. Published in NeurIPS 2021 (spotlight).


We prove the connection between the Kullback–Leibler divergence and the weighted combination of score matching losses used for training score-based generative models. Our results can be viewed as a generalization of both the de Bruijn identity in information theory and the evidence lower bound in variational inference.

Our theoretical results enable ScoreFlow, a continuous normalizing flow model trained with a variational objective, which is much more efficient than neural ODEs. We report the state-of-the-art likelihood on CIFAR-10 and ImageNet 32x32 among all flow models, achieving comparable performance to cutting-edge autoregressive models.

How to run the code

Dependencies

Run the following to install a subset of necessary python packages for our code

pip install -r requirements.txt

Stats files for quantitative evaluation

We provide stats files for computing FID and Inception scores for CIFAR-10 and ImageNet 32x32. You can find cifar10_stats.npz and imagenet32_stats.npz under the directory assets/stats in our Google drive. Download them and save to assets/stats/ in the code repo.

Usage

Train and evaluate our models through main.py. Here are some common options:

main.py:
  --config: Training configuration.
    (default: 'None')
  --eval_folder: The folder name for storing evaluation results
    (default: 'eval')
  --mode: <train|eval|train_deq>: Running mode: train or eval or training the Flow++ variational dequantization model
  --workdir: Working directory
  • config is the path to the config file. Our config files are provided in configs/. They are formatted according to ml_collections and should be quite self-explanatory.

    Naming conventions of config files: the name of a config file contains the following attributes:

    • dataset: Either cifar10 or imagenet32
    • model: Either ddpmpp_continuous or ddpmpp_deep_continuous
  • workdir is the path that stores all artifacts of one experiment, like checkpoints, samples, and evaluation results.

  • eval_folder is the name of a subfolder in workdir that stores all artifacts of the evaluation process, like meta checkpoints for supporting pre-emption recovery, image samples, and numpy dumps of quantitative results.

  • mode is either "train" or "eval" or "train_deq". When set to "train", it starts the training of a new model, or resumes the training of an old model if its meta-checkpoints (for resuming running after pre-emption in a cloud environment) exist in workdir/checkpoints-meta . When set to "eval", it can do the following:

    • Compute the log-likelihood on the training or test dataset.

    • Compute the lower bound of the log-likelihood on the training or test dataset.

    • Evaluate the loss function on the test / validation dataset.

    • Generate a fixed number of samples and compute its Inception score, FID, or KID. Prior to evaluation, stats files must have already been downloaded/computed and stored in assets/stats.

      When set to "train_deq", it trains a Flow++ variational dequantization model to bridge the gap of likelihoods on continuous and discrete images. Recommended if you want to compete with generative models trained on discrete images, such as VAEs and autoregressive models. train_deq mode also supports pre-emption recovery.

These functionalities can be configured through config files, or more conveniently, through the command-line support of the ml_collections package.

Configurations for training

To turn on likelihood weighting, set --config.training.likelihood_weighting. To additionally turn on importance sampling for variance reduction, use --config.training.likelihood_weighting. To train a separate Flow++ variational dequantizer, you need to first finish training a score-based model, then use --mode=train_deq.

Configurations for evaluation

To generate samples and evaluate sample quality, use the --config.eval.enable_sampling flag; to compute log-likelihoods, use the --config.eval.enable_bpd flag, and specify --config.eval.dataset=train/test to indicate whether to compute the likelihoods on the training or test dataset. Turn on --config.eval.bound to evaluate the variational bound for the log-likelihood. Enable --config.eval.dequantizer to use variational dequantization for likelihood computation. --config.eval.num_repeats configures the number of repetitions across the dataset (more can reduce the variance of the likelihoods; default to 5).

Pretrained checkpoints

All checkpoints are provided in this Google drive.

Folder structure:

  • assets: contains cifar10_stats.npz and imagenet32_stats.npz. Necessary for computing FID and Inception scores.
  • <cifar10|imagenet32>_(deep)_<vp|subvp>_(likelihood)_(iw)_(flip). Here the part enclosed in () is optional. deep in the name specifies whether the score model is a deeper architecture (ddpmpp_deep_continuous). likelihood specifies whether the model was trained with likelihood weighting. iw specifies whether the model was trained with importance sampling for variance reduction. flip shows whether the model was trained with horizontal flip for data augmentation. Each folder has the following two subfolders:
    • checkpoints: contains the last checkpoint for the score-based model.
    • flowpp_dequantizer/checkpoints: contains the last checkpoint for the Flow++ variational dequantization model.

References

If you find the code useful for your research, please consider citing

@inproceedings{song2021maximum,
  title={Maximum Likelihood Training of Score-Based Diffusion Models},
  author={Song, Yang and Durkan, Conor and Murray, Iain and Ermon, Stefano},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

This work is built upon some previous papers which might also interest you:

  • Yang Song and Stefano Ermon. "Generative Modeling by Estimating Gradients of the Data Distribution." Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, 2019.
  • Yang Song and Stefano Ermon. "Improved techniques for training score-based generative models." Proceedings of the 34th Annual Conference on Neural Information Processing Systems, 2020.
  • Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. "Score-Based Generative Modeling through Stochastic Differential Equations". Proceedings of the 9th International Conference on Learning Representations, 2021.
Owner
Yang Song
PhD Candidate in Stanford AI Lab
Yang Song
An implementation of chunked, compressed, N-dimensional arrays for Python.

Zarr Latest Release Package Status License Build Status Coverage Downloads Gitter Citation What is it? Zarr is a Python package providing an implement

Zarr Developers 1.1k Dec 30, 2022
Face Mask Detection on Image and Video using tensorflow and keras

Face-Mask-Detection Face Mask Detection on Image and Video using tensorflow and keras Train Neural Network on face-mask dataset using tensorflow and k

Nahid Ebrahimian 12 Nov 11, 2022
Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021) Overview of paths used in DIG and IG. w is the word being attributed. The

INK Lab @ USC 17 Oct 27, 2022
ML-Ensemble – high performance ensemble learning

A Python library for high performance ensemble learning ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framew

Sebastian Flennerhag 764 Dec 31, 2022
ICON: Implicit Clothed humans Obtained from Normals

ICON: Implicit Clothed humans Obtained from Normals arXiv, December 2021. Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black Table of C

Yuliang Xiu 1.1k Dec 30, 2022
Deep Probabilistic Programming Course @ DIKU

Deep Probabilistic Programming Course @ DIKU

52 May 14, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
(EI 2022) Controllable Confidence-Based Image Denoising

Image Denoising with Control over Deep Network Hallucination Paper and arXiv preprint -- Our frequency-domain insights derive from SFM and the concept

Images and Visual Representation Laboratory (IVRL) at EPFL 5 Dec 18, 2022
A scikit-learn-compatible module for estimating prediction intervals.

MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals (or prediction sets) using your favourit

588 Jan 04, 2023
covid question answering datasets and fine tuned models

Covid-QA Fine tuned models for question answering on Covid-19 data. Hosted Inference This model has been contributed to huggingface.Click here to see

Abhijith Neil Abraham 19 Sep 09, 2021
BRepNet: A topological message passing system for solid models

BRepNet: A topological message passing system for solid models This repository contains the an implementation of BRepNet: A topological message passin

Autodesk AI Lab 42 Dec 30, 2022
AbelNN: Deep Learning Python module from scratch

AbelNN: Deep Learning Python module from scratch I have implemented several neural networks from scratch using only Numpy. I have designed the module

Abel 2 Apr 12, 2022
The best solution of the Weather Prediction track in the Yandex Shifts challenge

yandex-shifts-weather The repository contains information about my solution for the Weather Prediction track in the Yandex Shifts challenge https://re

Ivan Yu. Bondarenko 15 Dec 18, 2022
Python and Julia in harmony.

PythonCall & JuliaCall Bringing Python® and Julia together in seamless harmony: Call Python code from Julia and Julia code from Python via a symmetric

Christopher Rowley 414 Jan 07, 2023
Scaling and Benchmarking Self-Supervised Visual Representation Learning

FAIR Self-Supervision Benchmark is deprecated. Please see VISSL, a ground-up rewrite of benchmark in PyTorch. FAIR Self-Supervision Benchmark This cod

Meta Research 584 Dec 31, 2022
공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다.

ObsCare_Main 소개 공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다. CCTV의 대수가 급격히 늘어나면서 관리와 효율성 문제와 더불어, 곳곳에 설치된 CCTV를 개별 관제하는 것으로는 응급 상

5 Jul 07, 2022
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

AdaFocusV2 This repo contains the official code and pre-trained models for AdaFo

79 Dec 26, 2022
subpixel: A subpixel convnet for super resolution with Tensorflow

subpixel: A subpixel convolutional neural network implementation with Tensorflow Left: input images / Right: output images with 4x super-resolution af

Atrium LTS 2.1k Dec 23, 2022
An open source library for face detection in images. The face detection speed can reach 1000FPS.

libfacedetection This is an open source library for CNN-based face detection in images. The CNN model has been converted to static variables in C sour

Shiqi Yu 11.4k Dec 27, 2022