Pytorch implementations of Bayes By Backprop, MC Dropout, SGLD, the Local Reparametrization Trick, KF-Laplace, SG-HMC and more

Overview

Bayesian Neural Networks

License: MIT Python 2.7+ Pytorch 1.0

Pytorch implementations for the following approximate inference methods:

We also provide code for:

Prerequisites

  • PyTorch
  • Numpy
  • Matplotlib

The project is written in python 2.7 and Pytorch 1.0.1. If CUDA is available, it will be used automatically. The models can also run on CPU as they are not excessively big.

Usage

Structure

Regression experiments

We carried out homoscedastic and heteroscedastic regression experiements on toy datasets, generated with (Gaussian Process ground truth), as well as on real data (six UCI datasets).

Notebooks/classification/(ModelName)_(ExperimentType).ipynb: Contains experiments using (ModelName) on (ExperimentType), i.e. homoscedastic/heteroscedastic. The heteroscedastic notebooks contain both toy and UCI dataset experiments for a given (ModelName).

We also provide Google Colab notebooks. This means that you can run on a GPU (for free!). No modifications required - all dependencies and datasets are added from within the notebooks - except for selecting Runtime -> Change runtime type -> Hardware accelerator -> GPU.

MNIST classification experiments

train_(ModelName)_(Dataset).py: Trains (ModelName) on (Dataset). Training metrics and model weights will be saved to the specified directories.

src/: General utilities and model definitions.

Notebooks/classification: An asortment of notebooks which allow for model training, evaluation and running of digit rotation uncertainty experiments. They also allow for weight distribution plotting and weight pruning. They allow for loading of pre-trained models for experimentation.

Bayes by Backprop (BBP)

(https://arxiv.org/abs/1505.05424)

Colab notebooks with regression models: BBP homoscedastic / heteroscedastic

Train a model on MNIST:

python train_BayesByBackprop_MNIST.py [--model [MODEL]] [--prior_sig [PRIOR_SIG]] [--epochs [EPOCHS]] [--lr [LR]] [--n_samples [N_SAMPLES]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

For an explanation of the script's arguments:

python train_BayesByBackprop_MNIST.py -h

Best results are obtained with a Laplace prior.

Local Reparametrisation Trick

(https://arxiv.org/abs/1506.02557)

Bayes By Backprop inference where the mean and variance of activations are calculated in closed form. Activations are sampled instead of weights. This makes the variance of the Monte Carlo ELBO estimator scale as 1/M, where M is the minibatch size. Sampling weights scales (M-1)/M. The KL divergence between gaussians can also be computed in closed form, further reducing variance. Computation of each epoch is faster and so is convergence.

Train a model on MNIST:

python train_BayesByBackprop_MNIST.py --model Local_Reparam [--prior_sig [PRIOR_SIG]] [--epochs [EPOCHS]] [--lr [LR]] [--n_samples [N_SAMPLES]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

MC Dropout

(https://arxiv.org/abs/1506.02142)

A fixed dropout rate of 0.5 is set.

Colab notebooks with regression models: MC Dropout homoscedastic heteroscedastic

Train a model on MNIST:

python train_MCDropout_MNIST.py [--weight_decay [WEIGHT_DECAY]] [--epochs [EPOCHS]] [--lr [LR]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

For an explanation of the script's arguments:

python train_MCDropout_MNIST.py -h

Stochastic Gradient Langevin Dynamics (SGLD)

(https://www.ics.uci.edu/~welling/publications/papers/stoclangevin_v6.pdf)

In order to converge to the true posterior over w, the learning rate should be annealed according to the Robbins-Monro conditions. In practise, we use a fixed learning rate.

Colab notebooks with regression models: SGLD homoscedastic / heteroscedastic

Train a model on MNIST:

python train_SGLD_MNIST.py [--use_preconditioning [USE_PRECONDITIONING]] [--prior_sig [PRIOR_SIG]] [--epochs [EPOCHS]] [--lr [LR]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

For an explanation of the script's arguments:

python train_SGLD_MNIST.py -h

pSGLD

(https://arxiv.org/abs/1512.07666)

SGLD with RMSprop preconditioning. A higher learning rate should be used than for vanilla SGLD.

Train a model on MNIST:

python train_SGLD_MNIST.py --use_preconditioning True [--prior_sig [PRIOR_SIG]] [--epochs [EPOCHS]] [--lr [LR]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

Bootstrap MAP Ensemble

Multiple networks are trained on subsamples of the dataset.

Colab notebooks with regression models: MAP Ensemble homoscedastic / heteroscedastic

Train an ensemble on MNIST:

python train_Bootrap_Ensemble_MNIST.py [--weight_decay [WEIGHT_DECAY]] [--subsample [SUBSAMPLE]] [--n_nets [N_NETS]] [--epochs [EPOCHS]] [--lr [LR]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

For an explanation of the script's arguments:

python train_Bootrap_Ensemble_MNIST.py -h

Kronecker-Factorised Laplace

(https://openreview.net/pdf?id=Skdvd2xAZ)

Train a MAP network and then calculate a second order taylor series aproxiamtion to the curvature around a mode of the posterior. A block diagonal Hessian approximation is used, where only intra-layer dependencies are accounted for. The Hessian is further approximated as the kronecker product of the expectation of a single datapoint's Hessian factors. Approximating the Hessian can take a while. Fortunately it only needs to be done once.

Train a MAP network on MNIST and approximate Hessian:

python train_KFLaplace_MNIST.py [--weight_decay [WEIGHT_DECAY]] [--hessian_diag_sig [HESSIAN_DIAG_SIG]] [--epochs [EPOCHS]] [--lr [LR]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

For an explanation of the script's arguments:

python train_KFLaplace_MNIST.py -h

Note that we save the unscaled and uninverted Hessian factors. This will allow for computationally cheap changes to the prior at inference time as the Hessian will not need to be re-computed. Inference will require inverting the approximated Hessian factors and sampling from a matrix normal distribution. This is shown in notebooks/KFAC_Laplace_MNIST.ipynb

Stochastic Gradient Hamiltonian Monte Carlo

(https://arxiv.org/abs/1402.4102)

We implement the scale-adapted version of this algorithm, proposed here to find hyperparameters automatically during burn-in. We place a Gaussian prior over network weights and a Gamma hyperprior over the Gaussian's precision.

Run SG-HMC-SA burn in and sampler, saving weights in specified file.

python train_SGHMC_MNIST.py [--epochs [EPOCHS]] [--sample_freq [SAMPLE_FREQ]] [--burn_in [BURN_IN]] [--lr [LR]] [--models_dir [MODELS_DIR]] [--results_dir [RESULTS_DIR]]

For an explanation of the script's arguments:

python train_SGHMC_MNIST.py -h

Approximate Inference in Neural Networks

Map inference provides a point estimate of parameter values. When provided with out of distribution inputs, such as rotated digits, these models then to make wrong predictions with high confidence.

Uncertainty Decomposition

We can measure uncertainty in our models' predictions through predictive entropy. We can decompose this term in order to distinguish between 2 types of uncertainty. Uncertainty caused by noise in the data, or Aleatoric uncertainty, can be quantified as the expected entropy of model predictions. Model uncertainty or Epistemic uncertainty can be measured as the difference between total entropy and aleatoric entropy.

Results

Homoscedastic Regression

Toy homoscedastic regression task. Data is generated by a GP with a RBF kernel (l = 1, σn = 0.3). We use a single-output FC network with one hidden layer of 200 ReLU units to predict the regression mean μ(x). A fixed log σ is learnt separately.

Heteroscedastic Regression

Same scenario as previous section but log σ(x) is predicted from the input.

Toy heteroscedastic regression task. Data is generated by a GP with a RBF kernel (l = 1 σn = 0.3 · |x + 2|). We use a two-head network with 200 ReLU units to predict the regression mean μ(x) and log-standard deviation log σ(x).

Regression on UCI datasets

We performed heteroscedastic regression on the six UCI datasets (housing, concrete, energy efficiency, power plant, red wine and yacht datasets), using 10-foild cross validation. All these experiments are contained in the heteroscedastic notebooks. Note that results depend heavily on hyperparameter selection. Plots below show log-likelihoods and RMSEs on the train (semi-transparent colour) and test (solid colour). Circles and error bars correspond to the 10-fold cross validation mean and standard deviations respectively.

MNIST Classification

W is marginalised with 100 samples of the weights for all models except MAP, where only one set of weights is used.

MNIST Test MAP MAP Ensemble BBP Gaussian BBP GMM BBP Laplace BBP Local Reparam MC Dropout SGLD pSGLD
Log Like -572.9 -496.54 -1100.29 -1008.28 -892.85 -1086.43 -435.458 -828.29 -661.25
Error % 1.58 1.53 2.60 2.38 2.28 2.61 1.37 1.76 1.76

MNIST test results for methods under consideration. Estensive hyperparameter tunning has not been performed. We approximate the posterior predictive distribution with 100 MC samples. We use a FC network with two 1200 unit ReLU layers. If unspecified, the prior is Gaussian with std=0.1. P-SGLD uses RMSprop preconditioning.

The original paper for Bayes By Backprop reports around 1% error on MNIST. We find that this result is attainable only if approximate posterior variances are initialised to be very small (BBP Gauss 2). In this scenario, the distributions over weights resemble deltas, giving good predictive performance but bad uncertainty estimates. However, when initialising the variances to match the prior (BBP Gauss 1), we obtain the above results. The training curves for both of these hyperparameter configuration schemes are shown below:

MNIST Uncertainty

Total, aleatoric and epistemic uncertainties obtained when creating OOD samples by augmenting the MNIST test set with rotations:

Total and epistemic uncertainties obtained by testing our models, - which have been trained on MNIST -, on the KMNIST dataset:

Adversarial robustness

Total, aleatoric and epistemic uncertainties obtained when feeding our models with adversarial samples (fgsm).

Weight Distributions

Histograms of weights sampled from each model trained on MNIST. We draw 10 samples of w for each model.

Weight Pruning

#TODO

Owner
Machine Learning PhD student at University of Cambridge. Telecommunications (EE/CS) engineer.
A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

Cluster di HPC con GPU per esperimenti di calcolo (draft version 1.0) Per poter utilizzare il cluster il primo passo è abilitare l'account istituziona

20 Dec 16, 2022
Arquitetura e Desenho de Software.

S203 Este é um repositório dedicado às aulas de Arquitetura e Desenho de Software, cuja sigla é "S203". E agora, José? Como não tenho muito a falar aq

Fabio 7 Oct 23, 2021
Code for PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing

PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing CVPR 2021. Project page: https://kai-46.github.io/

Kai Zhang 141 Dec 14, 2022
Pmapper is a super-resolution and deconvolution toolkit for python 3.6+

pmapper pmapper is a super-resolution and deconvolution toolkit for python 3.6+. PMAP stands for Poisson Maximum A-Posteriori, a highly flexible and a

NASA Jet Propulsion Laboratory 8 Nov 06, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

Oliver Hahn 1 Jan 26, 2022
Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

Learning to Adapt Structured Output Space for Semantic Segmentation Pytorch implementation of our method for adapting semantic segmentation from the s

Yi-Hsuan Tsai 782 Dec 30, 2022
Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Neural Magic Eye Preprint | Project Page | Colab Runtime Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Un

Zhengxia Zou 56 Jul 15, 2022
Implementation of GGB color space

GGB Color Space This package is implementation of GGB color space from Development of a Robust Algorithm for Detection of Nuclei and Classification of

Resha Dwika Hefni Al-Fahsi 2 Oct 06, 2021
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021
Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

Gradient Centralization TensorFlow This Python package implements Gradient Centralization in TensorFlow, a simple and effective optimization technique

Rishit Dagli 101 Nov 01, 2022
Supplementary materials for ISMIR 2021 LBD paper "Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes"

Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Supplementary materials for ISMIR 2021 LBD submission: K. N. W

Karn Watcharasupat 2 Oct 25, 2021
PyTorch implementation of a collections of scalable Video Transformer Benchmarks.

PyTorch implementation of Video Transformer Benchmarks This repository is mainly built upon Pytorch and Pytorch-Lightning. We wish to maintain a colle

Xin Ma 156 Jan 08, 2023
A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

BraVe This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short. The model provided in this package wa

DeepMind 44 Nov 20, 2022
Deep-learning-roadmap - All You Need to Know About Deep Learning - A kick-starter

Deep Learning - All You Need to Know Sponsorship To support maintaining and upgrading this project, please kindly consider Sponsoring the project deve

Instill AI 4.4k Dec 26, 2022
Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Topology-Imbalance Learning for Semi-Supervised Node Classification Introduction Code for NeurIPS 2021 paper "Topology-Imbalance Learning for Semi-Sup

Victor Chen 40 Nov 23, 2022
Multi-scale discriminator feature-wise loss function

Multi-Scale Discriminative Feature Loss This repository provides code for Multi-Scale Discriminative Feature (MDF) loss for image reconstruction algor

Graphics and Displays group - University of Cambridge 76 Dec 12, 2022
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

Stratified Transformer for 3D Point Cloud Segmentation Xin Lai*, Jianhui Liu*, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia

DV Lab 195 Jan 01, 2023
Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image This repository contains the code for the following paper: R. Hu,

Meta Research 37 Jan 04, 2023
A TensorFlow implementation of FCN-8s

FCN-8s implementation in TensorFlow Contents Overview Examples and demo video Dependencies How to use it Download pre-trained VGG-16 Overview This is

Pierluigi Ferrari 50 Aug 08, 2022
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022