Instance-based label smoothing for improving deep neural networks generalization and calibration

Last update: Aug 13, 2022

Overview

Instance-based Label Smoothing for Neural Networks

Pytorch Implementation of the algorithm.
This repository includes a new proposed method for instance-based label smoothing in neural networks, where the target probability distribution is not uniformly distributed among incorrect classes. Instead, each incorrect class is going to be assigned a target probability that is proportional to the output score of this particular class relative to all the remaining classes for a network trained with vanilla cross-entropy loss on the hard target labels.

The following figure summarizes the idea of our instance-based label smoothing that aims to keep the information about classes similarity structure while training using label smoothing.

Requirements

Python 3.x
pandas
numpy
pytorch

Usage

Datasets

CIFAR10 / CIFAR100 / FashionMNIST

Files Content

The project have a structure as below:

├── Vanilla-cross-entropy.py
├── Label-smoothing.py
├── Instance-based-smoothing.py
├── Models-evaluation.py
├── Network-distillation.py
├── utils
│   ├── data_loader.py
│   ├── utils.py
│   ├── evaluate.py
│   ├── params.json
├── models
│   ├── resnet.py
│   ├── densenet.py
│   ├── inception.py
│   ├── shallownet.py

Vanilla-cross-entropy.py is the file used for training the networks using cross-entropy without label smoothing.
Label-smoothing.py is the file used for training the networks using cross-entropy with standard label smoothing.
Instance-based-smoothing.py is the file used for training the networks using cross-entropy with instance-based label smoothing.
Models-evaluation.py is the file used for evaluation of the trained networks.
Network-distillation.py is the file used for distillation of trained networks into a shallow convolutional network of 5 layers.
models/ includes all the implementations of the different architectures used in our evaluation like ResNet, DenseNet, Inception-V4. Also, the shallow-cnn student network used in distillation experiments.
utils/ includes all utilities functions required for the different models training and evaluation.

Example

python Instance-based-smoothing.py --dataset cifar10 --model resnet18 --num_classes 10

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

--lr type = float, default = 0.1, help = Starting learning rate (A weight decay of $1e^{-4}$ is used).
--tr_size type = float, default = 0.8, help = Size of training set split out of the whole training set (0.2 for validation).
--batch_size type = int, default = 512, help = Batch size of mini-batch training process.
--epochs type = int, default = 100, help = Number of training epochs.
--estop type = int, default = 10, help = Number of epochs without loss improvement leading to early stopping.
--ece_bins type = int, default = 10, help = Number of bins for expected calibration error calculation.
--dataset, type=str, help=Name of dataset to be used (cifar10/cifar100/fashionmnist).
--num_classes type = int, default = 10, help = Number of classes in the dataset.
--model, type=str, help=Name of the model to be trained. eg: resnet18 / resnet50 / inceptionv4 / densetnet (works for FashionMNIST only).

Results

Results of the comparison of different methods on 3 datasets using 4 different architectures are reported in the following table.
The experiments were repeated 3 times, and average $\pm$ stdev of log loss, expected calibration error (ECE), accuracy, distilled student network accuracy and distilled student log loss metrics are reported.

A t-sne visualization for the logits of 3-different classes in CIFAR-10 can be shown below:

Instance-based label smoothing for improving deep neural networks generalization and calibration

Related tags

Overview

Instance-based Label Smoothing for Neural Networks

Requirements

Usage

Datasets

Files Content

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

Results

Owner

Mohamed Maher

AutoML library for deep learning

This repository contains implementations of all Machine Learning Algorithms from scratch in Python. Mathematics required for ML and many projects have also been included.

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

How to Leverage Multimodal EHR Data for Better Medical Predictions?

Repository for reproducing `Model-Based Robust Deep Learning`

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

[CVPR 2021] Unsupervised Degradation Representation Learning for Blind Super-Resolution

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

Learned image compression

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

This repository contains the source code for the paper Tutorial on amortized optimization for learning to optimize over continuous domains by Brandon Amos

Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models