A simple and extensible library to create Bayesian Neural Network layers on PyTorch.

Overview

Blitz - Bayesian Layers in Torch Zoo

Downloads

BLiTZ is a simple and extensible library to create Bayesian Neural Network Layers (based on whats proposed in Weight Uncertainty in Neural Networks paper) on PyTorch. By using BLiTZ layers and utils, you can add uncertanity and gather the complexity cost of your model in a simple way that does not affect the interaction between your layers, as if you were using standard PyTorch.

By using our core weight sampler classes, you can extend and improve this library to add uncertanity to a bigger scope of layers as you will in a well-integrated to PyTorch way. Also pull requests are welcome.

Our objective is empower people to apply Bayesian Deep Learning by focusing rather on their idea, and not the hard-coding part.

Roadmap:

  • Enable reparametrization for different posterior distributions than Normal.

Index

Install

To install BLiTZ you can use pip command:

pip install blitz-bayesian-pytorch

You can also git-clone it and pip-install it locally:

conda create -n blitz python=3.6
conda activate blitz
git clone https://github.com/piEsposito/blitz-bayesian-deep-learning.git
cd blitz-bayesian-deep-learning
pip install .

Documentation

Documentation for our layers, weight (and prior distribution) sampler and utils:

A simple example for regression

(You can see it for your self by running this example on your machine).

We will now see how can Bayesian Deep Learning be used for regression in order to gather confidence interval over our datapoint rather than a pontual continuous value prediction. Gathering a confidence interval for your prediction may be even a more useful information than a low-error estimation.

I sustain my argumentation on the fact that, with good/high prob a confidence interval, you can make a more reliable decision than with a very proximal estimation on some contexts: if you are trying to get profit from a trading operation, for example, having a good confidence interval may lead you to know if, at least, the value on which the operation wil procees will be lower (or higher) than some determinate X.

Knowing if a value will be, surely (or with good probability) on a determinate interval can help people on sensible decision more than a very proximal estimation that, if lower or higher than some limit value, may cause loss on a transaction. The point is that, sometimes, knowing if there will be profit may be more useful than measuring it.

In order to demonstrate that, we will create a Bayesian Neural Network Regressor for the Boston-house-data toy dataset, trying to create confidence interval (CI) for the houses of which the price we are trying to predict. We will perform some scaling and the CI will be about 75%. It will be interesting to see that about 90% of the CIs predicted are lower than the high limit OR (inclusive) higher than the lower one.

Importing the necessary modules

Despite from the known modules, we will bring from BLiTZ athe variational_estimatordecorator, which helps us to handle the BayesianLinear layers on the module keeping it fully integrated with the rest of Torch, and, of course, BayesianLinear, which is our layer that features weight uncertanity.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

Loading and scaling data

Nothing new under the sun here, we are importing and standard-scaling the data to help with the training.

X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(np.expand_dims(y, -1))

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=.25,
                                                    random_state=42)


X_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()
X_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()

Creating our variational regressor class

We can create our class with inhreiting from nn.Module, as we would do with any Torch network. Our decorator introduces the methods to handle the bayesian features, as calculating the complexity cost of the Bayesian Layers and doing many feedforwards (sampling different weights on each one) in order to sample our loss.

@variational_estimator
class BayesianRegressor(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        #self.linear = nn.Linear(input_dim, output_dim)
        self.blinear1 = BayesianLinear(input_dim, 512)
        self.blinear2 = BayesianLinear(512, output_dim)
        
    def forward(self, x):
        x_ = self.blinear1(x)
        x_ = F.relu(x_)
        return self.blinear2(x_)

Defining a confidence interval evaluating function

This function does create a confidence interval for each prediction on the batch on which we are trying to sample the label value. We then can measure the accuracy of our predictions by seeking how much of the prediciton distributions did actually include the correct label for the datapoint.

def evaluate_regression(regressor,
                        X,
                        y,
                        samples = 100,
                        std_multiplier = 2):
    preds = [regressor(X) for i in range(samples)]
    preds = torch.stack(preds)
    means = preds.mean(axis=0)
    stds = preds.std(axis=0)
    ci_upper = means + (std_multiplier * stds)
    ci_lower = means - (std_multiplier * stds)
    ic_acc = (ci_lower <= y) * (ci_upper >= y)
    ic_acc = ic_acc.float().mean()
    return ic_acc, (ci_upper >= y).float().mean(), (ci_lower <= y).float().mean()

Creating our regressor and loading data

Notice here that we create our BayesianRegressor as we would do with other neural networks.

regressor = BayesianRegressor(13, 1)
optimizer = optim.Adam(regressor.parameters(), lr=0.01)
criterion = torch.nn.MSELoss()

ds_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)

ds_test = torch.utils.data.TensorDataset(X_test, y_test)
dataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)

Our main training and evaluating loop

We do a training loop that only differs from a common torch training by having its loss sampled by its sample_elbo method. All the other stuff can be done normally, as our purpose with BLiTZ is to ease your life on iterating on your data with different Bayesian NNs without trouble.

Here is our very simple training loop:

iteration = 0
for epoch in range(100):
    for i, (datapoints, labels) in enumerate(dataloader_train):
        optimizer.zero_grad()
        
        loss = regressor.sample_elbo(inputs=datapoints,
                           labels=labels,
                           criterion=criterion,
                           sample_nbr=3)
        loss.backward()
        optimizer.step()
        
        iteration += 1
        if iteration%100==0:
            ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,
                                                                        X_test,
                                                                        y_test,
                                                                        samples=25,
                                                                        std_multiplier=3)
            
            print("CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}".format(ic_acc, under_ci_upper, over_ci_lower))
            print("Loss: {:.4f}".format(loss))

Bayesian Deep Learning in a Nutshell

A very fast explanation of how is uncertainity introduced in Bayesian Neural Networks and how we model its loss in order to objectively improve the confidence over its prediction and reduce the variance without dropout.

First of all, a deterministic NN layer linear transformation

As we know, on deterministic (non bayesian) neural network layers, the trainable parameters correspond directly to the weights used on its linear transformation of the previous one (or the input, if it is the case). It corresponds to the following equation:

equation

(Z correspond to the activated-output of the layer i)

The purpose of Bayesian Layers

Bayesian layers seek to introduce uncertainity on its weights by sampling them from a distribution parametrized by trainable variables on each feedforward operation.

This allows we not just to optimize the performance metrics of the model, but also gather the uncertainity of the network predictions over a specific datapoint (by sampling it much times and measuring the dispersion) and aimingly reduce as much as possible the variance of the network over the prediction, making possible to know how much of incertainity we still have over the label if we try to model it in function of our specific datapoint.

Weight sampling on Bayesian Layers

To do so, on each feedforward operation we sample the parameters of the linear transformation with the following equations (where ρ parametrizes the standard deviation and μ parametrizes the mean for the samples linear transformation parameters) :

For the weights:

equation

Where the sampled W corresponds to the weights used on the linear transformation for the ith layer on the nth sample.

For the biases:

equation

Where the sampled b corresponds to the biases used on the linear transformation for the ith layer on the nth sample.

It is possible to optimize our trainable weights

Even tough we have a random multiplier for our weights and biases, it is possible to optimize them by, given some differentiable function of the weights sampled and trainable parameters (in our case, the loss), summing the derivative of the function relative to both of them:

  1. Let equation
  2. Let equation
  3. Let equation
  4. Let equation be differentiable relative to its variables

Therefore:

  1. equation

and

  1. equation

It is also true that there is complexity cost function differentiable along its variables

It is known that the crossentropy loss (and MSE) are differentiable. Therefore if we prove that there is a complexity-cost function that is differentiable, we can leave it to our framework take the derivatives and compute the gradients on the optimization step.

The complexity cost is calculated, on the feedforward operation, by each of the Bayesian Layers, (with the layers pre-defined-simpler apriori distribution and its empirical distribution). The sum of the complexity cost of each layer is summed to the loss.

As proposed in Weight Uncertainty in Neural Networks paper, we can gather the complexity cost of a distribution by taking the Kullback-Leibler Divergence from it to a much simpler distribution, and by making some approximation, we will can differentiate this function relative to its variables (the distributions):

  1. Let equation be a low-entropy distribution pdf set by hand, which will be assumed as an "a priori" distribution for the weights

  2. Let equation be the a posteriori empirical distribution pdf for our sampled weights, given its parameters.

Therefore, for each scalar on the W sampled matrix:

  1. equation

By assuming a very large n, we could approximate:

  1. equation

and therefore:

  1. equation

As the expected (mean) of the Q distribution ends up by just scaling the values, we can take it out of the equation (as there will be no framework-tracing). Have a complexity cost of the nth sample as:

  1. equation

Which is differentiable relative to all of its parameters.

To get the whole cost function at the nth sample:

  1. Let a performance (fit to data) function be: equation

Therefore the whole cost function on the nth sample of weights will be:

  1. equation

We can estimate the true full Cost function by Monte Carlo sampling it (feedforwarding the netwok X times and taking the mean over full loss) and then backpropagate using our estimated value. It works for a low number of experiments per backprop and even for unitary experiments.

Some notes and wrap up

We came to the and of a Bayesian Deep Learning in a Nutshell tutorial. By knowing what is being done here, you can implement your bnn model as you wish.

Maybe you can optimize by doing one optimize step per sample, or by using this Monte-Carlo-ish method to gather the loss some times, take its mean and then optimizer. Your move.

FYI: Our Bayesian Layers and utils help to calculate the complexity cost along the layers on each feedforward operation, so don't mind it to much.

References:

Citing

If you use BLiTZ in your research, you can cite it as follows:

@misc{esposito2020blitzbdl,
    author = {Piero Esposito},
    title = {BLiTZ - Bayesian Layers in Torch Zoo (a Bayesian Deep Learing library for Torch)},
    year = {2020},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/piEsposito/blitz-bayesian-deep-learning/}},
}
Special thanks to Intel Student Ambassador program
Made by Pi Esposito
Comments
  • Huge Memory Demand

    Huge Memory Demand

    Hi, when I try to train a simple model using Blizt I get huge memory demand > 12 GB for a very small dataset.

    @variational_estimator class BayesianLstm(Module): def init(self, output_size=1, input_size=24, hidden_size=32, seq_length=30, hidden_neurons=8, batch_size=512): super(BayesianLstm, self).init()

          self.input_size = input_size
          self.hidden_size = hidden_size
          self.seq_length = seq_length
          self.hidden_layer_size = hidden_size
    
          # First lstm cell
          self.lstm1 = BayesianLSTM(input_size, hidden_size)
          # second lstm cell
          self.lstm2 = BayesianLSTM(hidden_size, hidden_size*2)
          # first fully connected layer
          self.fc1 = BayesianLinear(hidden_size * 2, hidden_neurons)
          self.act1 = nn.ReLU()
          # self.bat1 = nn.BatchNorm1d(num_features=hidden_neurons)
          self.drop = nn.Dropout(inplace=True, p=0.5)
    
          # second fully connected layer
          self.fc2 = BayesianLinear(hidden_neurons, hidden_neurons)
          self.act2 = nn.ReLU()
          # self.bat2 = nn.BatchNorm1d(num_features=hidden_neurons)
    
          # output
          self.output = BayesianLinear(hidden_neurons, output_size)
    

    My data is of shape [batchsize, sequence_length, number_features] I tried this for batchsize = 512, sequence_length= 30, number_features=24.

    opened by serop96 8
  • Bayesian MLP not learning for regression tasks

    Bayesian MLP not learning for regression tasks

    Hi (and thanks a bunch for this framework!),

    I'm testing out a Bayesian neural net for a simple regression task. However, after a lot of training, when I test the output, I just get an (almost) constant output. I follow the same workflow as in the Boston Housing example, except I use a function to generate my dataset.

    Here's my code, if you're interested:

    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    import torch.tensor as Tensor
    import numpy as np
    import matplotlib.pyplot as plt
    
    from blitz.modules import BayesianLinear
    from blitz.utils import variational_estimator
    
    from sklearn.preprocessing import StandardScaler
    from sklearn.model_selection import train_test_split
    
    X = np.expand_dims(np.random.uniform(0,10,1000), -1)    # these two lines are the only thing I changed as far
    y = np.sin(X)                                                                            # as data preprocessing is concerned
    
    X = StandardScaler().fit_transform(X)
    y = StandardScaler().fit_transform(y)
    
    X_train, X_test, y_train, y_test = train_test_split(X,
                                                        y,
                                                        test_size=.25,
                                                        random_state=42)
    X_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()
    X_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()
    
    ds_train = torch.utils.data.TensorDataset(X_train, y_train)
    dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)
    
    ds_test = torch.utils.data.TensorDataset(X_test, y_test)
    dataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)
    
    @variational_estimator
    class BayesianRegressor(nn.Module):
        def __init__(self):
            super().__init__()
            #self.linear = nn.Linear(input_dim, output_dim)
            self.blinear1 = BayesianLinear(1, 100)
            self.blinear2 = BayesianLinear(100, 100)
            self.blinear3 = BayesianLinear(100, 1)
            self.sigmoid = nn.Sigmoid()
            
        def forward(self, x_):
            x = self.sigmoid(self.blinear1(x_))
            x = self.sigmoid(self.blinear2(x))
            x = self.blinear3(x)
            return x
    
    regressor = BayesianRegressor().to(device)
    criterion = torch.nn.MSELoss()
    optimizer = optim.Adam(regressor.parameters(), lr=0.005)
    
    iteration = 0
    hist = []
    for epoch in range(1000):
        totalloss = 0
        u = 0
        for i, (datapoints, labels) in enumerate(dataloader_train):
            u += 1
            optimizer.zero_grad()
            
            loss = regressor.sample_elbo(inputs=datapoints.to(device),
                               labels=labels.to(device),
                               criterion=criterion,
                               sample_nbr=3,
                               complexity_cost_weight=1/X_train.shape[0])
            totalloss += loss.item()
            loss.backward()
            optimizer.step()
        hist.append(totalloss/u)
        print(f"[Epoch {epoch}] "+"Loss: {:.4f}".format(totalloss/u))
    

    Then I generate the outputs:

    plt.scatter(X,y)
    plt.scatter(X,regressor(Tensor(X).float().to(device)).detach().cpu(),s=5)
    plt.show()
    

    which gives me: this

    Is there something I'm doing wrong here?

    opened by tiwalayo 8
  • Do you have cuda support?

    Do you have cuda support?

    Hi! If I run model on GPU, it's cause RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

    Do you have plans to add cuda support?

    opened by Archelunch 8
  • Extra -0.5 in log_posterior()?

    Extra -0.5 in log_posterior()?

    In https://github.com/piEsposito/blitz-bayesian-deep-learning/blob/master/blitz/modules/weight_sampler.py#L50,

    log_posteriors =  -log_sqrt2pi - torch.log(self.sigma) - (((w - self.mu) ** 2)/(2 * self.sigma ** 2)) - 0.5
    

    why is there a -0.5 at the end of the line? The log-likelihood of a Gaussian does not have that -0.5.

    opened by isaac-cfwong 7
  • Variance of predictions is to small

    Variance of predictions is to small

    Hi,

    I am trying to use Blitz for a regression problem. Unfortunately, my nets don´t seem to learn the variance of the data correctly. The variance in the predictions is so low, that the true value is never within a reasonable confidence interval around the mean of the predictions. However, the means of the predictions are acceptable and clearly show that the nets are learning something. For better understanding i attached a plot. What I have tried so far: • Different architectures with and without convolutional layers • Changing prior_sigma_1, prior_sigma_2 and prior_pi • Changing complexity_cost_weight in the sample_elbo method

    prediction density plot

    Best regards Lukas

    opened by LukasWFurtner 6
  • -inf in log_prior and nan in loss breaks training

    -inf in log_prior and nan in loss breaks training

    Hello, first of all amazing work, and thank you for this project! I'm trying to train simple 3-layered NN and I encountered some problems I wanted to ask about. Here is my model:

    BayesianRegressor(
      (blinear1): BayesianLinear(
        (weight_sampler): GaussianVariational()
        (bias_sampler): GaussianVariational()
        (weight_prior_dist): ScaleMixturePrior()
        (bias_prior_dist): ScaleMixturePrior()
      )
      (relu): ReLU()
      (blinear2): BayesianLinear(
        (weight_sampler): GaussianVariational()
        (bias_sampler): GaussianVariational()
        (weight_prior_dist): ScaleMixturePrior()
        (bias_prior_dist): ScaleMixturePrior()
      )
      (relu2): ReLU()
      (blinear3): BayesianLinear(
        (weight_sampler): GaussianVariational()
        (bias_sampler): GaussianVariational()
        (weight_prior_dist): ScaleMixturePrior()
        (bias_prior_dist): ScaleMixturePrior()
      )
    )
    

    I'm training it on dataset with prices of flats/houses I recently scraped, and I've encountered problem I cannot seem to fully understand: after a few epochs, loss returned by the model.sample_elbo method is sometimes equal to nan, which when backpropagated breaks the whole training, as some of the weights are 'optimized' to nans:

    model_copy.sample_elbo(inputs=datapoints.to(device),
                           labels=labels.to(device),
                           criterion=criterion,
                           sample_nbr=3,
                           complexity_cost_weight=1/X_train.shape[0])
    

    I managed to track down where the incorrect values appears first, before backpropagation of these nans, and it turned out that value of log_prior in first bayesian layer is sometimes equal to -inf

    first_layer = list(model_copy.modules())[0].blinear1
    first_layer .log_prior # returns -inf
    

    Going further I checked that the problem is in weight_prior_dist, which sometimes, like one in 5 times returns -inf:

    w =first_layer.weight_sampler.sample() #sampled weigths
    prior_dist = first_layer.weight_prior_dist 
    print(prior_dist.log_prior(w)) #sometimes returns -inf
    

    Going deeper I realised, that the problem is in prior_pdf of first prior distribution in weight_prior_dist of first layer. Some of logarithms of probabilities for the sampled values of weights (prior_dist.dist1.log_prob(w)) are very small, equal to ~-100, and when passed through torch.exp such small values are approximated to 0. When these 0-weights go through torch.log in prior_dist.log_prior(w) they are equal to -inf, and the whole mean approaches then -inf, which corrupts further calculations of loss:

    prob_n1 = torch.exp(prior_dist.dist1.log_prob(w)) # minimal value of this tensor is equal to 0
    if prior_dist.dist2 is not None:
        prob_n2 = torch.exp(prior_dist.dist2.log_prob(w))
    
    prior_pdf = (prior_dist.pi * prob_n1 + (1 - prior_dist.pi) * prob_n2) # minimal value of this tensor is equal to 0
    (torch.log(prior_pdf)).mean() #formula for calculating log_prior of weight_prior_dist, returns -inf
    

    If I understand correctly, it means that the probabilities of such sampled weights for prior distribution are very very small, approaching zero, but could you suggest me the way of tackling this problem somehow, so they remain very small, and not zero? Or maybe the problem is different? I'm still learning details of Bayesian DL, so I hope there aren't so many silly mistakes, and thank you for any kind of help! best regards Rafał

    opened by rafaljanwojcik 6
  • Specification of Additional Priors and Posterior MCMC

    Specification of Additional Priors and Posterior MCMC

    Hey there! Just want to say that I am really impressed by your repo. Good work!

    Will there be options to expand the set of priors that may be accessible and posterior sampling methods such as that featured in https://arxiv.org/abs/1902.03932 ?

    Thanks!

    opened by jlevy44 6
  • ScaleMixturePrior::log_prior()

    ScaleMixturePrior::log_prior()

    In the code below the log_prior is calculated by taking the mean of the log probabilities, however the paper describes taking the product of these weights.

    https://github.com/piEsposito/blitz-bayesian-deep-learning/blob/566eaa495f28fe9a8c2b076a7388548aa4f792f0/blitz/modules/weight_sampler.py#L61-L72

    Would it perhaps be preferable to use something like:

    reduce(lambda a, b: a * b, log(prior_pdf))

    opened by danielkelshaw 6
  • Interface features both parts of the ELBO-loss (likelihood part and performance part)

    Interface features both parts of the ELBO-loss (likelihood part and performance part)

    For my recent work about Bayesian Neural Networks for Time Series Forecasting in Hydrology I needed a slight modification of the BLiTZ-library where the function sample_elbo in variational_estimator.py does not only output the total loss, but the likelihood part and performance part individually. In addition, my version outputs the model predictions.

    As I think that this could be sensible addition for public use of the library, I would like to apply for a change here.

    Best regards, Jonas Fill

    opened by filljonas 5
  • Providing a minimal working example

    Providing a minimal working example

    I really like the idea of this library. However, it is really hard to get started with it. For example, the minimal working example in the read me doesn't work. No matter what I try, I cannot get it to improve the accuracy. The loss decrease towards zero, but the accuracy doesn't change. The problem is that it is totally unclear why this is the case is it because:

    • there is no non-linear activation function in the model?
    • Is the model too big for BNNs?
    • or is something else the problem?

    Any help would be appreciated.

    opened by j0rd1smit 5
  • GPU compatible if bias is set to zero conv2d fixed

    GPU compatible if bias is set to zero conv2d fixed

    Bug for conv2d if bias is set to False cus bias then torch.zeros were generated at cpu while other parameters were at cuda.... Now fixed only for conv2d...

    opened by Hannan4252 5
  • Error in BayesianRNN class

    Error in BayesianRNN class

    In the code of BayesianRNN, there has two self.bias = None, the first one should be self.bias_mu = None I suppose. And this class also lack the definition of self.bias_prior_dist, which used in def sharpen_posterior(self, loss, input_shape)

    class BayesianRNN(BayesianModule):
        """
        implements base class for B-RNN to enable posterior sharpening
        """
        def __init__(self,
                     sharpen=False):
            super().__init__()
            
            self.weight_ih_mu = None
            self.weight_hh_mu = None
            self.bias = None
            
            self.weight_ih_sampler = None
            self.weight_hh_sampler = None
            self.bias_sampler = None
    
            self.weight_ih = None
            self.weight_hh = None
            self.bias = None
            
            self.sharpen = sharpen
            
            self.weight_ih_eta = None
            self.weight_hh_eta = None
            self.bias_eta = None
            self.ff_parameters = None
            self.loss_to_sharpen = None
            
        
        def sample_weights(self):
            pass
        
        def init_sharpen_parameters(self):
            if self.sharpen:
                self.weight_ih_eta = nn.Parameter(torch.Tensor(self.weight_ih_mu.size()))
                self.weight_hh_eta = nn.Parameter(torch.Tensor(self.weight_hh_mu.size()))
                self.bias_eta = nn.Parameter(torch.Tensor(self.bias_mu.size()))
                
                self.ff_parameters = []
    
                self.init_eta()
        
        def init_eta(self):
            stdv = 1.0 / math.sqrt(self.weight_hh_eta.shape[0]) #correspond to hidden_units parameter
            self.weight_ih_eta.data.uniform_(-stdv, stdv)
            self.weight_hh_eta.data.uniform_(-stdv, stdv)
            self.bias_eta.data.uniform_(-stdv, stdv)
        
        def set_loss_to_sharpen(self, loss):
            self.loss_to_sharpen = loss
        
        def sharpen_posterior(self, loss, input_shape):
            """
            sharpens the posterior distribution by using the algorithm proposed in
            @article{DBLP:journals/corr/FortunatoBV17,
              author    = {Meire Fortunato and
                           Charles Blundell and
                           Oriol Vinyals},
              title     = {Bayesian Recurrent Neural Networks},
              journal   = {CoRR},
              volume    = {abs/1704.02798},
              year      = {2017},
              url       = {http://arxiv.org/abs/1704.02798},
              archivePrefix = {arXiv},
              eprint    = {1704.02798},
              timestamp = {Mon, 13 Aug 2018 16:48:21 +0200},
              biburl    = {https://dblp.org/rec/journals/corr/FortunatoBV17.bib},
              bibsource = {dblp computer science bibliography, https://dblp.org}
            }
            """
            bs, seq_len, in_size = input_shape
            gradients = torch.autograd.grad(outputs=loss,
                                            inputs=self.ff_parameters,
                                            grad_outputs=torch.ones(loss.size()).to(loss.device),
                                            create_graph=True,
                                            retain_graph=True,
                                            only_inputs=True)
            
            grad_weight_ih, grad_weight_hh, grad_bias = gradients
            
            #to generate sigmas on the weight sampler
            _ = self.sample_weights()
            
            weight_ih_sharpened = self.weight_ih_mu - self.weight_ih_eta * grad_weight_ih + self.weight_ih_sampler.sigma
            weight_hh_sharpened = self.weight_hh_mu - self.weight_hh_eta * grad_weight_hh + self.weight_hh_sampler.sigma
            bias_sharpened = self.bias_mu - self.bias_eta * grad_bias + self.bias_sampler.sigma
            
            if self.bias is not None:
                b_log_posterior = self.bias_sampler.log_posterior(w=bias_sharpened)
                b_log_prior_ = self.bias_prior_dist.log_prior(bias_sharpened)
                
            else:
                b_log_posterior = b_log_prior = 0
            
            
            self.log_variational_posterior += (self.weight_ih_sampler.log_posterior(w=weight_ih_sharpened) + b_log_posterior + self.weight_hh_sampler.log_posterior(w=weight_hh_sharpened)) / seq_len
            
            self.log_prior += self.weight_ih_prior_dist.log_prior(weight_ih_sharpened) + b_log_prior + self.weight_hh_prior_dist.log_prior(weight_hh_sharpened) / seq_len
            
            return weight_ih_sharpened, weight_hh_sharpened, bias_sharpened
    
    opened by LINGLONGQIAN 1
  • Question about a parameter of model.sample_elbo

    Question about a parameter of model.sample_elbo

    Hi, I'm grateful to your code. I have a question about "complexity_cost_weight". According your code, I think the "complexity cost weight" should be 1/batch_size. However, in your example "LeNet_MNIST.py", the "complexity cost weight" is 1/50000, and 50000 is the number of all training data. So how should I choose the parameter? Batch size or the number of all trianing data?

    opened by KQL11 0
  • Radial BNN implementation is wrong

    Radial BNN implementation is wrong

    I do not see the radial factor "r" to which the normalized epsilon is projected. Kindly help me in this regard. Moreover, "r" is to be sampled from a Normal distribution.

    opened by ishan-m 0
  • Questions about prior distributions

    Questions about prior distributions

    Can this library only use Gaussian distribution as prior distribution? Are there other prior distributions? Another question is that if the input data and the prior distribution are Gaussian distribution, is the output of the network a Gaussian distribution?

    opened by closeyourmoise 2
  • result seems not good on Bosten example

    result seems not good on Bosten example

    Hi, I try to implement the Boston example and use a plot to see fitting result. but the result seems not good enough.

    The followings are train result and validation result (mean with 3 times std) train_bosten_result_120 val_bosten_result_120

    So I print the criterion from elbo function, but the mse curve seems like a oscillation, not a expected descending trend. bosten_loss_120

    I am curious why mse is not descending? looking for your replying :)

    opened by hangzhang23 0
Releases(0.2.8)
Owner
Pi Esposito
Software Engineer, MLOps @ unico. Google Developer Expert in Machine Learning
Pi Esposito
Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

MTM This is the official repository of the paper: Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Cla

ICTMCG 13 Sep 17, 2022
Adaout is a practical and flexible regularization method with high generalization and interpretability

Adaout Adaout is a practical and flexible regularization method with high generalization and interpretability. Requirements python 3.6 (Anaconda versi

lambett 1 Feb 09, 2022
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI

Hourglass Transformer - Pytorch (wip) Implementation of Hourglass Transformer, in Pytorch. It will also contain some of my own ideas about how to make

Phil Wang 61 Dec 25, 2022
Tgbox-bench - Simple TGBOX upload speed benchmark

TGBOX Benchmark This script will benchmark upload speed to TGBOX storage. Build

Non 1 Jan 09, 2022
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework"

Privacy-Aware Inverse RL (PRIL) Analysis Framework Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based

1 Dec 06, 2021
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022
CLNTM - Contrastive Learning for Neural Topic Model

Contrastive Learning for Neural Topic Model This repository contains the impleme

Thong Thanh Nguyen 25 Nov 24, 2022
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

3DIAS_Pytorch This repository contains the official code to reproduce the results from the paper: 3DIAS: 3D Shape Reconstruction with Implicit Algebra

Mohsen Yavartanoo 21 Dec 12, 2022
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

Michihiro Yasunaga 264 Jan 01, 2023
2D Time independent Schrodinger equation solver for arbitrary shape of well

Schrodinger Well Python Python solver for timeless Schrodinger equation for well with arbitrary shape https://imgur.com/a/jlhK7OZ Pictures of circular

WeightAn 24 Nov 18, 2022
Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

carbon-footprint-calculator Conda distribution ~/anaconda3/bin/conda install anaconda-client conda-build ~/anaconda3/bin/conda config --set anaconda_u

Seattle university Renewable energy research 7 Sep 26, 2022
Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection 🤖 Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

Prem Kumar 86 Aug 03, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 06, 2023
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

102 Dec 25, 2022
Pneumonia Detection using machine learning - with PyTorch

Pneumonia Detection Pneumonia Detection using machine learning. Training was done in colab: DEMO: Result (Confusion Matrix): Data I uploaded my datase

Wilhelm Berghammer 12 Jul 07, 2022
PolyTrack: Tracking with Bounding Polygons

PolyTrack: Tracking with Bounding Polygons Abstract In this paper, we present a novel method called PolyTrack for fast multi-object tracking and segme

Gaspar Faure 13 Sep 15, 2022
The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Semi-Supervised Learning with Multi-Head Co-Training (PyTorch) Abstract Co-training, extended from self-training, is one of the frameworks for semi-su

cmc 6 Dec 04, 2022
Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

Nested Graph Neural Networks About Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance.

Muhan Zhang 38 Jan 05, 2023