Training Very Deep Neural Networks Without Skip-Connections

Overview

DiracNets

v2 update (January 2018):

The code was updated for DiracNets-v2 in which we removed NCReLU by adding per-channel a and b multipliers without weight decay. This allowed us to significantly simplify the network, which is now folds into a simple chain of convolution-ReLU layers, like VGG. On ImageNet DiracNet-18 and DiracNet-34 closely match corresponding ResNet with the same number of parameters.

See v1 branch for DiracNet-v1.


PyTorch code and models for DiracNets: Training Very Deep Neural Networks Without Skip-Connections

https://arxiv.org/abs/1706.00388

Networks with skip-connections like ResNet show excellent performance in image recognition benchmarks, but do not benefit from increased depth, we are thus still interested in learning actually deep representations, and the benefits they could bring. We propose a simple weight parameterization, which improves training of deep plain (without skip-connections) networks, and allows training plain networks with hundreds of layers. Accuracy of our proposed DiracNets is close to Wide ResNet (although DiracNets need more parameters to achieve it), and we are able to match ResNet-1000 accuracy with plain DiracNet with only 28 layers. Also, the proposed Dirac weight parameterization can be folded into one filter for inference, leading to easily interpretable VGG-like network.

DiracNets on ImageNet:

TL;DR

In a nutshell, Dirac parameterization is a sum of filters and scaled Dirac delta function:

conv2d(x, alpha * delta + W)

Here is simplified PyTorch-like pseudocode for the function we use to train plain DiracNets (with weight normalization):

def dirac_conv2d(input, W, alpha, beta)
    return F.conv2d(input, alpha * dirac(W) + beta * normalize(W))

where alpha and beta are per-channel scaling multipliers, and normalize does l_2 normalization over each feature plane.

Code

Code structure:

├── README.md # this file
├── diracconv.py # modular DiracConv definitions
├── test.py # unit tests
├── diracnet-export.ipynb # ImageNet pretrained models
├── diracnet.py # functional model definitions
└── train.py # CIFAR and ImageNet training code

Requirements

First install PyTorch, then install torchnet:

pip install git+https://github.com/pytorch/[email protected]

Install other Python packages:

pip install -r requirements.txt

To train DiracNet-34-2 on CIFAR do:

python train.py --save ./logs/diracnets_$RANDOM$RANDOM --depth 34 --width 2

To train DiracNet-18 on ImageNet do:

python train.py --dataroot ~/ILSVRC2012/ --dataset ImageNet --depth 18 --save ./logs/diracnet_$RANDOM$RANDOM \
                --batchSize 256 --epoch_step [30,60,90] --epochs 100 --weightDecay 0.0001 --lr_decay_ratio 0.1

nn.Module code

We provide DiracConv1d, DiracConv2d, DiracConv3d, which work like nn.Conv1d, nn.Conv2d, nn.Conv3d, but have Dirac-parametrization inside (our training code doesn't use these modules though).

Pretrained models

We fold batch normalization and Dirac parameterization into F.conv2d weight and bias tensors for simplicity. Resulting models are as simple as VGG or AlexNet, having only nonlinearity+conv2d as a basic block.

See diracnets.ipynb for functional and modular model definitions.

There is also folded DiracNet definition in diracnet.py, which uses code from PyTorch model_zoo and downloads pretrained model from Amazon S3:

from diracnet import diracnet18
model = diracnet18(pretrained=True)

Printout of the model above:

DiracNet(
  (features): Sequential(
    (conv): Conv2d (3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
    (max_pool0): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), ceil_mode=False)
    (group0.block0.relu): ReLU()
    (group0.block0.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block1.relu): ReLU()
    (group0.block1.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block2.relu): ReLU()
    (group0.block2.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block3.relu): ReLU()
    (group0.block3.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group1.block0.relu): ReLU()
    (group1.block0.conv): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block1.relu): ReLU()
    (group1.block1.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block2.relu): ReLU()
    (group1.block2.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block3.relu): ReLU()
    (group1.block3.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group2.block0.relu): ReLU()
    (group2.block0.conv): Conv2d (128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block1.relu): ReLU()
    (group2.block1.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block2.relu): ReLU()
    (group2.block2.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block3.relu): ReLU()
    (group2.block3.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group3.block0.relu): ReLU()
    (group3.block0.conv): Conv2d (256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block1.relu): ReLU()
    (group3.block1.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block2.relu): ReLU()
    (group3.block2.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block3.relu): ReLU()
    (group3.block3.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (last_relu): ReLU()
    (avg_pool): AvgPool2d(kernel_size=7, stride=7, padding=0, ceil_mode=False, count_include_pad=True)
  )
  (fc): Linear(in_features=512, out_features=1000)
)

The models were trained with OpenCV, so you need to use it too to reproduce stated accuracy.

Pretrained weights for DiracNet-18 and DiracNet-34:
https://s3.amazonaws.com/modelzoo-networks/diracnet18v2folded-a2174e15.pth
https://s3.amazonaws.com/modelzoo-networks/diracnet34v2folded-dfb15d34.pth

Pretrained weights for the original (not folded) model, functional definition only:
https://s3.amazonaws.com/modelzoo-networks/diracnet18-v2_checkpoint.pth
https://s3.amazonaws.com/modelzoo-networks/diracnet34-v2_checkpoint.pth

We plan to add more pretrained models later.

Bibtex

@inproceedings{Zagoruyko2017diracnets,
    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {DiracNets: Training Very Deep Neural Networks Without Skip-Connections},
    url = {https://arxiv.org/abs/1706.00388},
    year = {2017}}
Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

Second-order Convergence Properties of Random Search Methods This repository the paper "On the Second-order Convergence Properties of Random Search Me

Adamos Solomou 0 Nov 13, 2021
Deep Learning Specialization by Andrew Ng, deeplearning.ai.

Deep Learning Specialization on Coursera Master Deep Learning, and Break into AI This is my personal projects for the course. The course covers deep l

Engen 1.5k Jan 07, 2023
This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation.

ERFNet This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation. NEW!! New PyTorch

Edu 104 Jan 05, 2023
Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators. We provide

WeiYang 798 Jan 01, 2023
Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

tooraj taraz 4 Oct 21, 2022
Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA) is a new approach of noise reduction methods. In this repository is shown the package developed for this new method based on \citepaper.

Fully Adaptive Bayesian Algorithm for Data Analysis FABADA FABADA is a novel non-parametric noise reduction technique which arise from the point of vi

18 Oct 20, 2022
FairFuzz: AFL extension targeting rare branches

FairFuzz An AFL extension to increase code coverage by targeting rare branches. FairFuzz has a particular advantage on programs with highly nested str

Caroline Lemieux 222 Nov 16, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper) Authors: Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang PDF:

Kuang-Jui Hsu 139 Dec 22, 2022
Revisiting Self-Training for Few-Shot Learning of Language Model.

SFLM This is the implementation of the paper Revisiting Self-Training for Few-Shot Learning of Language Model. SFLM is short for self-training for few

15 Nov 19, 2022
RID-Noise: Towards Robust Inverse Design under Noisy Environments

This is code of RID-Noise. Reproduce RID-Noise Results Toy tasks Please refer to the notebook ridnoise.ipynb to view experiments on three toy tasks. B

Thyrix 2 Nov 23, 2022
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

Awesome Visual-Transformer Collect some Transformer with Computer-Vision (CV) papers. If you find some overlooked papers, please open issues or pull r

dkliang 2.8k Jan 08, 2023
Randomizes the warps in a stock pokeemerald repo.

pokeemerald warp randomizer Randomizes the warps in a stock pokeemerald repo. Usage Instructions Install networkx and matplotlib via pip3 or similar.

Max Thomas 6 Mar 17, 2022
Tutorial on scikit-learn and IPython for parallel machine learning

Parallel Machine Learning with scikit-learn and IPython Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearra

Olivier Grisel 1.6k Dec 26, 2022
Systemic Evolutionary Chemical Space Exploration for Drug Discovery

SECSE SECSE: Systemic Evolutionary Chemical Space Explorer Chemical space exploration is a major task of the hit-finding process during the pursuit of

64 Dec 16, 2022
Our implementation used for the MICCAI 2021 FLARE Challenge titled 'Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements'.

Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements Our implementation used for the MICCAI 2021 FLARE C

Franz Thaler 3 Sep 27, 2022
[KDD 2021, Research Track] DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks

DiffMG This repository contains the code for our KDD 2021 Research Track paper: DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neura

AutoML Research 24 Nov 29, 2022
113 Nov 28, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
A curated list of awesome Model-Based RL resources

Awesome Model-Based Reinforcement Learning This is a collection of research papers for model-based reinforcement learning (mbrl). And the repository w

OpenDILab 427 Jan 03, 2023