Code for Multinomial Diffusion

Abstract

Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood.

Link: https://arxiv.org/abs/2102.05379

Instructions

In the folder containing setup.py, run

pip install --user -e .

The --user option ensures the library will only be installed for your user. The -e option makes it possible to modify the library, and modifications will be loaded on the fly.

You should now be able to use it.

Running experiments.

Go to the experiment of interest (folder segmentation_diffusion or text_diffusion) and follow the readme instructions there.

Acknowledgements

The Robert Bosch GmbH is acknowledged for financial support.

Code for Multinomial Diffusion

Related tags

Overview

Code for Multinomial Diffusion

Abstract

Instructions

Running experiments.

Acknowledgements

Owner

novel deep learning research works with PaddlePaddle

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

YOLOv5 in PyTorch > ONNX > CoreML > TFLite

A project that uses optical flow and machine learning to detect aimhacking in video clips.

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Serving PyTorch 1.0 Models as a Web Server in C++

Directed Greybox Fuzzing with AFL

CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum

Scalable Optical Flow-based Image Montaging and Alignment

A collection of papers about Transformer in the field of medical image analysis.

Speech Recognition using DeepSpeech2.

RetinaNet-PyTorch - A RetinaNet Pytorch Implementation on remote sensing images and has the similar mAP result with RetinaNet in MMdetection

Differentiable Annealed Importance Sampling (DAIS)

Dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions"

Code for ACL2021 long paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP