Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

Overview

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

This is the code for the paper:

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
Presented at ICML 2018

Please note that this is not an officially supported Google product.

If you find this code useful in your research then please cite

@inproceedings{jiang2018mentornet,
  title={MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels},
  author={Jiang, Lu and Zhou, Zhengyuan and Leung, Thomas and Li, Li-Jia and Fei-Fei, Li},
  booktitle={ICML},
  year={2018}
}

Introduction

We are interested in training a deep network using curriculum learning (Bengio et al., 2009), i.e. learning examples with focus. Each curriculum is implemented as a network (called MentorNet).

  • During training, MentorNet supervises the training of the base network (called StudentNet).
  • At the test time, StudentNet makes prediction alone without MentorNet.

Training Overview

Setups

All code was developed and tested on Nvidia V100/P100 (16GB) the following environment.

  • Ubuntu 18.04
  • Python 2.7.15
  • TensorFlow 1.8.0
  • numpy 1.13.3
  • imageio 2.3.0

Download Cloud SDK to get data and models. Next we need to download the dataset and pre-trained MentorNet models. Put them into the same directory as the code directory.

gsutil -m cp -r gs://mentornet_project/data .
gsutil -m cp -r gs://mentornet_project/mentornet_models .

Alternatively, you may download the zip files: data and models.

Running MentorNet on CIFAR

export PYTHONPATH="$PYTHONPATH:$PWD/code/"

python code/cifar_train_mentornet.py \
  --dataset_name=cifar10   \
  --trained_mentornet_dir=mentornet_models/models/mentornet_pd1_g_1/mentornet_pd \
  --loss_p_precentile=0.75  \
  --nofixed_epoch_after_burn_in  \
  --burn_in_epoch=0  \
  --example_dropout_rates="0.5,17,0.05,83" \
  --data_dir=data/cifar10/0.2 \
  --train_log_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \
  --studentnet=resnet101 \
  --max_number_of_steps=39000

A full list of commands can be found in this file. The training script has a number of command-line flags that you can use to configure the model architecture, hyperparameters, and input / output settings:

  • --trained_mentornet_dir: Directory where to find the trained MentorNet model, created by mentornet_learning/train.py.
  • --loss_p_percentile: p-percentile used to compute the loss moving average. Default is 0.7.
  • --burn_in_epoch: Number of first epochs to perform burn-in. In the burn-in period, every sample has a fixed 1.0 weight. Default is 0.
  • --fixed_epoch_after_burn_in: Whether to use the fixed epoch as the MentorNet input feature after the burn-in period. Set True for MentorNet DD. Default is False.
  • --loss_moving_average_decay: Decay factor used in moving average. Default is 0.5.
  • --example_dropout_rates: Comma-separated list indicating the example drop-out rate for the total of 100 epochs. The format is [dropout rate, epoch_num]+, the piecewise drop-out rate from boundaries and values. The sum of epoch_num is 100. Drop-out means the probability of setting sample weights to zeros proposed (Liang et al., 2016). Default is 0.5, 17, 0.05, 78, 1.0, 5.

To evaluate a model, run the evaluation job in parallel with the training job (on a different GPU).

python cifar/cifar_eval.py \
 --dataset_name=cifar10 \
 --data_dir=cifar/data/cifar10/val/ \
 --checkpoint_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \
 --eval_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1//eval_val \
 --studentnet=resnet101 \
 --device_id=1

A complete list of commands of running experiments can be found at commands/train_studentnet_resnet.sh and commands/train_studentnet_inception.sh.

MentorNet Framework

MentorNet is a general framework for curriculum learning, where various curriculums can be learned by the same MentorNet structure of different parameters.

It is flexible as we can switch curriculums by attaching different MentorNets without modifying the pipeline.

We train a few MentorNets listed below. We can think of a MentorNet as a hyper-parameter and will be tuned for different problems.

Curriculum Visualization Intuition Model Name
No curriculum image Assign uniform weight to every sample uniform. baseline_mentornet
Self-paced
(Kuma et al. 2010)
image Favor samples of smaller loss. self_paced_mentornet
SPCL linear
(Jiang et al. 2015)
image Discount the weight by loss linearly. spcl_linear_mentornet
Hard example mining
(Felzenszwalb et al., 2008)
image Favor samples of greater loss. hard_example_mining_mentornet
Focal loss
(Lin et al., 2017)
image Increase the weight by loss by the exponential CDF. focal_loss_mentornet
Predefined Mixture image Mixture of SPL and SPCL changing by epoch. mentornet_pd
MentorNet Data-driven image Learned on a small subset of the CIFAR data. mentornet_dd

Note there are many more curriculums can be trained by MentorNet, for example, prediction variance (Chang et al., 2017), implicit regularizer (Fan et al. 2017), self-paced with diversity (Jiang et al. 2014), sample re-weighting (Dehghani et al., 2018, Ren et al., 2018), etc.

Performance

The numbers are slightly different from the ones reported in the paper due to the re-implementation on the third party library.

CIFAR-10 ResNet

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.796 0.822 0.797 0.910 0.914
0.4 0.568 0.802 0.634 0.776 0.887
0.8 0.238 0.297 0.25 0.283 0.463

CIFAR-100 ResNet

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.624 0.652 0.613 0.733 0.726
0.4 0.448 0.509 0.467 0.567 0.675
0.8 0.084 0.089 0.079 0.193 0.301

CIFAR-10 Inception

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.775 0.784 0.747 0.798 0.800
0.4 0.72 0.733 0.695 0.731 0.763
0.8 0.29 0.272 0.309 0.312 0.461

CIFAR-100 Inception

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.42 0.408 0.391 0.451 0.466
0.4 0.346 0.32 0.313 0.386 0.411
0.8 0.108 0.091 0.107 0.125 0.203

Algorithm

We propose an algorithm to optimize the StudentNet model parameter w jointly with a

given MentorNet. Unlike the alternating minimization, it minimizes w (StudentNet parameter) and v (sample weight) stochastically over mini-batches.

The curriculum can change during training, and MentorNet is updated a few times in the algorithm.

Algorithm

To learn new curriculums (Step 6), see this page.

We found specific MentorNet architectures do not matter that much.

References

  • Bengio, Yoshua, et al. "Curriculum learning". In ICML, 2009.
  • Kumar M. Pawan, Packer Benjamin, and Koller Daphne "Self-paced learning for latent variable models". In NIPS, 2010.
  • Jiang, Lu et al. "Self-paced Learning with Diversity", In NIPS 2014
  • Jiang, Lu, et al. "Self-Paced Curriculum Learning." In AAAI. 2015.
  • Liang, Junwei et al. Learning to Detect Concepts from Webly-Labeled Video Data, In IJCAI 2016.
  • Lin, Tsung-Yi, et al. "Focal loss for dense object detection." In ICCV. 2017.
  • Fan, Yanbo, et al. "Self-Paced Learning: an Implicit Regularization Perspective." In AAAI 2017.
  • Felzenszwalb, Pedro, et al. "A discriminatively trained, multiscale, deformable part model." In CVPR 2008.
  • Dehghani, Mostafa, et al. "Fidelity-Weighted Learning." In ICLR 2018.
  • Ren, Mengye, et al. "Learning to reweight examples for robust deep learning." In ICML 2018.
  • Fan, Yang, et al. "Learning to Teach." In ICLR 2018.
  • Chang, Haw-Shiuan, et al. "Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples." In NIPS 2017.
Owner
Google
Google ❤️ Open Source
Google
Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Disentangle Your Dense Object Detector This repo contains the supported code and configuration files to reproduce object detection results of Disentan

loveSnowBest 51 Jan 07, 2023
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 07, 2022
[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

Junhyeong Cho 29 Dec 10, 2022
Rest API Written In Python To Classify NSFW Images.

Rest API Written In Python To Classify NSFW Images.

Wahyusaputra 2 Dec 23, 2021
This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

1 Dec 24, 2021
This project helps to colorize grayscale images using multiple exemplars.

Multiple Exemplar-based Deep Colorization (Pytorch Implementation) Pretrained Model [Jitendra Chautharia](IIT Jodhpur)1,3, Prerequisites Python 3.6+ N

jitendra chautharia 3 Aug 05, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
NEG loss implemented in pytorch

Pytorch Negative Sampling Loss Negative Sampling Loss implemented in PyTorch. Usage neg_loss = NEG_loss(num_classes, embedding_size) optimizer =

Daniil Gavrilov 123 Sep 13, 2022
This is a deep learning-based method to segment deep brain structures and a brain mask from T1 weighted MRI.

DBSegment This tool generates 30 deep brain structures segmentation, as well as a brain mask from T1-Weighted MRI. The whole procedure should take ~1

Luxembourg Neuroimaging (Platform OpNeuroImg) 2 Oct 25, 2022
Pytorch implementation of AREL

Status: Archive (code is provided as-is, no updates expected) Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement

8 Nov 25, 2022
Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022
Implementation of GGB color space

GGB Color Space This package is implementation of GGB color space from Development of a Robust Algorithm for Detection of Nuclei and Classification of

Resha Dwika Hefni Al-Fahsi 2 Oct 06, 2021
deep learning model with only python and numpy with test accuracy 99 % on mnist dataset and different optimization choices

deep_nn_model_with_only_python_100%_test_accuracy deep learning model with only python and numpy with test accuracy 99 % on mnist dataset and differen

0 Aug 28, 2022
HybVIO visual-inertial odometry and SLAM system

HybVIO A visual-inertial odometry system with an optional SLAM module. This is a research-oriented codebase, which has been published for the purposes

Spectacular AI 320 Jan 03, 2023
[CVPR 2021] NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning

NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning Project Page | Paper | Supplemental material #1 | Supplement

KAIST VCLAB 49 Nov 24, 2022
Tools for the Cleveland State Human Motion and Control Lab

Introduction This is a collection of tools that are helpful for gait analysis. Some are specific to the needs of the Human Motion and Control Lab at C

CSU Human Motion and Control Lab 88 Dec 16, 2022
AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

AI Virtual Calculator: This is a simple virtual calculator that works with gestures using OpenCV. We will use our hand in the air to click on the calc

Md. Rakibul Islam 1 Jan 13, 2022
Tensor-Based Quantum Machine Learning

TensorLy_Quantum TensorLy-Quantum is a Python library for Tensor-Based Quantum Machine Learning that builds on top of TensorLy and PyTorch. Website: h

TensorLy 85 Dec 03, 2022
Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate

News 05/17/2021 To make the comparison on ZJU-MoCap easier, we save quantitative and qualitative results of other methods at here, including Neural Vo

ZJU3DV 748 Jan 07, 2023