Implementing DropPath/StochasticDepth in PyTorch

Related tags

Deep LearningDropPath
Overview
%load_ext memory_profiler

Implementing Stochastic Depth/Drop Path In PyTorch

DropPath is available on glasses my computer vision library!

Introduction

Today we are going to implement Stochastic Depth also known as Drop Path in PyTorch! Stochastic Depth introduced by Gao Huang et al is technique to "deactivate" some layers during training.

Let's take a look at a normal ResNet Block that uses residual connections (like almost all models now).If you are not familiar with ResNet, I have an article showing how to implement it.

Basically, the block's output is added to its input: output = block(input) + input. This is called a residual connection

alt

Here we see four ResnNet like blocks, one after the other.

alt

Stochastic Depth/Drop Path will deactivate some of the block's weight

alt

The idea is to reduce the number of layers/block used during training, saving time and make the network generalize better.

Practically, this means setting to zero the output of the block before adding.

Implementation

Let's start by importing our best friend, torch.

import torch
from torch import nn
from torch import Tensor

We can define a 4D tensor (batch x channels x height x width), in our case let's just send 4 images with one pixel each :)

x = torch.ones((4, 1, 1, 1))

We need a tensor of shape batch x 1 x 1 x 1 that will be used to set some of the elements in the batch to zero, using a given prob. Bernoulli to the rescue!

keep_prob: float = .5
mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    
mask
tensor([[[[0.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Btw, this is equivelant to

mask: Tensor = (torch.rand(x.shape[0], 1, 1, 1) > keep_prob).float()
mask
tensor([[[[1.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Before we multiply x by the mask we need to divide x by keep_prob to rescale down the inputs activation during training, see cs231n. So

x_scaled : Tensor = x / keep_prob
x_scaled
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

Finally

output: Tensor = x_scaled * mask
output
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

We can put together in a function

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x_scaled: Tensor = x / keep_prob
    return x_scaled * mask

drop_path(x, keep_prob=0.5)
tensor([[[[0.]]],


        [[[0.]]],


        [[[2.]]],


        [[[0.]]]])

We can also do the operation in place

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x.div_(keep_prob)
    x.mul_(mask)
    return x


drop_path(x, keep_prob=0.5)
tensor([[[[2.]]],


        [[[2.]]],


        [[[0.]]],


        [[[0.]]]])

However, we may want to use x somewhere else, and dividing x or mask by keep_prob is the same thing. Let's arrive at the final implementation

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1, 1, 1))
drop_path(x, keep_prob=0.8)
tensor([[[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]]])

drop_path only works for 2d data, we need to automatically calculate the number of dimensions from the input size to make it work for any data time

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask_shape: Tuple[int] = (x.shape[0],) + (1,) * (x.ndim - 1) 
    # remember tuples have the * operator -> (1,) * 3 = (1,1,1)
    mask: Tensor = x.new_empty(mask_shape).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1))
drop_path(x, keep_prob=0.8)
tensor([[0.],
        [0.],
        [0.],
        [0.]])

Let's create a nice DropPath nn.Module

class DropPath(nn.Module):
    def __init__(self, p: float = 0.5, inplace: bool = False):
        super().__init__()
        self.p = p
        self.inplace = inplace

    def forward(self, x: Tensor) -> Tensor:
        if self.training and self.p > 0:
            x = drop_path(x, self.p, self.inplace)
        return x

    def __repr__(self):
        return f"{self.__class__.__name__}(p={self.p})"

    
DropPath()(torch.ones((4, 1)))
tensor([[2.],
        [0.],
        [0.],
        [0.]])

Usage with Residual Connections

We have our DropPath, cool but how do we use it? We need a classic ResNet block, let's implement our good old friend BottleNeckBlock

from torch import nn


class ConvBnAct(nn.Sequential):
    def __init__(self, in_features: int, out_features: int, kernel_size=1):
        super().__init__(
            nn.Conv2d(in_features, out_features, kernel_size=kernel_size, padding=kernel_size // 2),
            nn.BatchNorm2d(out_features),
            nn.ReLU()
        )
         

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        return x + res
    
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28))).shape
torch.Size([1, 64, 28, 28])

To deactivate the block the operation x + res must be equal to res, so our DropPath has to be applied after the block.

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        self.drop_path = DropPath()
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        x = self.drop_path(x)
        return x + res
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28)))
tensor([[[[1.0009, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          ...,
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0005, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0011, 1.0011,  ..., 1.0011, 1.0011, 1.0247]],

         [[1.0203, 1.0123, 1.0123,  ..., 1.0123, 1.0123, 1.0299],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          ...,
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         ...,

         [[1.0011, 1.0180, 1.0180,  ..., 1.0180, 1.0180, 1.0465],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0130, 1.0170, 1.0170,  ..., 1.0170, 1.0170, 1.0213],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          ...,
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0012, 1.0139, 1.0139,  ..., 1.0139, 1.0139, 1.0065]],

         [[1.0103, 1.0181, 1.0181,  ..., 1.0181, 1.0181, 1.0539],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          ...,
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]]]],
       grad_fn=<AddBackward0>)

Tada 🎉 ! Now, randomly, our .block will be completely skipped!


Owner
Francesco Saverio Zuppichini
Computer Vision Engineer @ 🤗 BSc informatics. MSc AI. Artificial Intelligence /Deep Learning Enthusiast & Full Stack developer
Francesco Saverio Zuppichini
Addition of pseudotorsion caclulation eta, theta, eta', and theta' to barnaba package

Addition to Original Barnaba Code: This is modified version of Barnaba package to calculate RNA pseudotorsion angles eta, theta, eta', and theta'. Ple

Mandar Kulkarni 1 Jan 11, 2022
DeepAL: Deep Active Learning in Python

DeepAL: Deep Active Learning in Python Python implementations of the following active learning algorithms: Random Sampling Least Confidence [1] Margin

Kuan-Hao Huang 583 Jan 03, 2023
A pre-trained language model for social media text in Spanish

RoBERTuito A pre-trained language model for social media text in Spanish READ THE FULL PAPER Github Repository RoBERTuito is a pre-trained language mo

25 Dec 29, 2022
Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

TargetCLIP- official pytorch implementation of the paper Image-Based CLIP-Guided Essence Transfer This repository finds a global direction in StyleGAN

Hila Chefer 221 Dec 13, 2022
DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control One version of our system is implemented using the

260 Nov 28, 2022
Pytorch implementation of XRD spectral identification from COD database

XRDidentifier Pytorch implementation of XRD spectral identification from COD database. Details will be explained in the paper to be submitted to NeurI

Masaki Adachi 4 Jan 07, 2023
2021:"Bridging Global Context Interactions for High-Fidelity Image Completion"

TFill arXiv | Project This repository implements the training, testing and editing tools for "Bridging Global Context Interactions for High-Fidelity I

Chuanxia Zheng 111 Jan 08, 2023
Code for the paper Hybrid Spectrogram and Waveform Source Separation

Demucs Music Source Separation This is the 3rd release of Demucs (v3), featuring hybrid source separation. For the waveform only Demucs (v2): Go this

Meta Research 4.8k Jan 04, 2023
Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors Geometry-Free View Synthesis: Transformers and no 3D Priors Robin Rombach*, Patrick Esser*

CompVis Heidelberg 293 Dec 22, 2022
Annealed Flow Transport Monte Carlo

Annealed Flow Transport Monte Carlo Open source implementation accompanying ICML 2021 paper by Michael Arbel*, Alexander G. D. G. Matthews* and Arnaud

DeepMind 30 Nov 21, 2022
Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

Sergei Belousov 25 Jul 19, 2022
SAS: Self-Augmentation Strategy for Language Model Pre-training

SAS: Self-Augmentation Strategy for Language Model Pre-training This repository

Alibaba 5 Nov 02, 2022
The Official TensorFlow Implementation for SPatchGAN (ICCV2021)

SPatchGAN: Official TensorFlow Implementation Paper "SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation"

39 Dec 30, 2022
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022
Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning"

VANET Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning" Introduction This is the implementation of article VAN

EMDATA-AILAB 23 Dec 26, 2022
This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans This repository contains the implementation of the pap

Photogrammetry & Robotics Bonn 40 Dec 01, 2022
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 01, 2023
PyTorch implementation of a Real-ESRGAN model trained on custom dataset

Real-ESRGAN PyTorch implementation of a Real-ESRGAN model trained on custom dataset. This model shows better results on faces compared to the original

Sber AI 160 Jan 04, 2023
Code for the Paper "Diffusion Models for Handwriting Generation"

Code for the Paper "Diffusion Models for Handwriting Generation"

62 Dec 21, 2022
Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

Nicholas Monath 35 Nov 16, 2022