SOTA model in CIFAR10

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick,数据增强,正则化方法,并进行了实现。目前项目告一段落,如果有更好的想法,或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

  • Python 3.6+
  • torch=1.8.0+cu111
  • torchvision+0.9.0+cu111
  • tqdm=4.26.0
  • PyYAML=6.0

1. Implements

1.1 Tricks

  • Warmup
  • Cosine LR Decay
  • SAM
  • Label Smooth
  • KD
  • Adabound
  • Xavier Kaiming init
  • lr finder

1.2 Augmentation

  • Auto Augmentation
  • Cutout
  • Mixup
  • RICAP
  • Random Erase
  • ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.82(96.18) 0.158 3.89
WideResNet28-10 +RICAP 2.82(97.18) 0.141 2.85
WideResNet28-10 +Random Erasing 3.18(96.82) 0.114 4.65
WideResNet28-10 +Mixup 3.02(96.98) 0.158 3.02

3.2 Reimplementation结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.78(96.22) 3.89
WideResNet28-10 +RICAP 2.81(97.19) 2.85
WideResNet28-10 +Random Erasing 3.03(96.97) 0.113 4.65
WideResNet28-10 +Mixup 2.93(97.07) 0.158 3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data,epoch200,bs128)

Model Error rate Loss
lenet(cpu爆炸) (70.76)
wideresnet 3.78(96.22)
resnet20 (89.72)
senet (92.34)
resnet18 (92.08)
resnet34 (92.48)
resnet50 (91.72)
regnet (92.58)
nasnet out of mem
shake_resnet26_2x32d (93.06)
shake_resnet26_2x64d (94.14)
densenet (92.06)
dla (92.58)
googlenet (91.90) 0.2675
efficientnetb0(利用率低且慢) (86.82) 0.5024
mobilenet(利用率低) (89.18)
mobilenetv2 (91.06)
pnasnet (90.44)
preact_resnet (90.76)
resnext (92.30)
vgg(cpugpu利用率都高) (88.38)
inceptionv3 (91.84)
inceptionv4 (91.10)
inception_resnet_v2 (83.46)
rir (92.34) 0.3932
squeezenet(CPU利用率高) (89.16) 0.4311
stochastic_depth_resnet18 (90.22)
xception
dpn (92.06) 0.3002
ge_resnext29_8x64d (93.86) 巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度,并进行训练,可以得到以下结论:

结论:lenet这种卷积量比较少,只有两层的,cpu利用率高,gpu利用率低。在这个基础上增加深度,用vgg那种直筒方式增加深度,发现深度越深,cpu利用率越低,gpu利用率越高。

修改训练过程的batch size,可以得到以下结论:

结论:bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 200 96.33
shake_resnet26_2x64d 200 96.99
shake_resnet26_2x64d 200 96.60
shake_resnet26_2x64d 200 96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture epoch SAM ASAM Cosine LR Decay LabelSmooth C10 test acc (%)
shake_resnet26_2x64d 200 96.51
shake_resnet26_2x64d 200 96.80
shake_resnet26_2x64d 200 96.61
shake_resnet26_2x64d 200 96.57

PS:其他库在加长训练过程(epoch=1800)情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 300 96.66
shake_resnet26_2x64d 300 97.21
shake_resnet26_2x64d 300 96.90
shake_resnet26_2x64d 300 96.73

1800 epoch CIFAR ZOO中结果,由于耗时过久,未进行复现。

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 1800 96.94(cifar zoo)
shake_resnet26_2x64d 1800 97.20(cifar zoo)
shake_resnet26_2x64d 1800 97.42(cifar zoo)
shake_resnet26_2x64d 1800 97.71(cifar zoo)

3.8 Divide and Co-training方案研究

  • lr:
    • warmup (20 epoch)
    • cosine lr decay
    • lr=0.1
    • total epoch(300 epoch)
  • bs=128
  • aug:
    • Random Crop and resize
    • Random left-right flipping
    • AutoAugment
    • Normalization
    • Random Erasing
    • Mixup
  • weight decay=5e-4 (bias and bn undecayed)
  • kaiming weight init
  • optimizer: nesterov

复现:((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture epoch cutout mixup autoaugment random-erase C10 test acc (%)
shake_resnet26_2x64d 200 96.42
shake_resnet26_2x64d 200 96.49
shake_resnet26_2x64d 200 96.17
shake_resnet26_2x64d 200 96.25
shake_resnet26_2x64d 200 96.20
shake_resnet26_2x64d 200 95.82
shake_resnet26_2x64d 200 96.02
shake_resnet26_2x64d 200 96.00
shake_resnet26_2x64d 200 95.83
shake_resnet26_2x64d 200 95.89
shake_resnet26_2x64d 200 96.25
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

Owner
PJDong
Computer vision learner, deep learner
PJDong
Hyperparameters tuning and features selection are two common steps in every machine learning pipeline.

shap-hypetune A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models. Overview Hyperparameters t

Marco Cerliani 422 Jan 08, 2023
Huawei Hackathon 2021 - Sweden (Stockholm)

huawei-hackathon-2021 Contributors DrakeAxelrod Challenge Requirements: python=3.8.10 Standard libraries (no importing) Important factors: Data depend

Drake Axelrod 32 Nov 08, 2022
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Facebook Research 712 Dec 19, 2022
Few-shot NLP benchmark for unified, rigorous eval

FLEX FLEX is a benchmark and framework for unified, rigorous few-shot NLP evaluation. FLEX enables: First-class NLP support Support for meta-training

AI2 85 Dec 03, 2022
A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

Ayushman Dash 93 Aug 04, 2022
City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones Code

City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones Requirements Python 3.8 or later with all requirements.txt dependencies installed,

88 Dec 12, 2022
Chainer implementation of recent GAN variants

Chainer-GAN-lib This repository collects chainer implementation of state-of-the-art GAN algorithms. These codes are evaluated with the inception score

399 Oct 23, 2022
Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

[AAAI2022] UCTransNet This repo is the official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspectiv

Haonan Wang 199 Jan 03, 2023
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

PyTorch Image Models Sponsors What's New Introduction Models Features Results Getting Started (Documentation) Train, Validation, Inference Scripts Awe

Ross Wightman 22.9k Jan 09, 2023
Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs

Context-Aware-Healthcare Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs Download

LuChang 9 Dec 26, 2022
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
Clustering is a popular approach to detect patterns in unlabeled data

Visual Clustering Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a data

Tarek Naous 24 Nov 11, 2022
A general python framework for visual object tracking and video object segmentation, based on PyTorch

PyTracking A general python framework for visual object tracking and video object segmentation, based on PyTorch. 📣 Two tracking/VOS papers accepted

2.6k Jan 04, 2023
Pytorch Lightning 1.2k Jan 06, 2023
Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

The Apache Software Foundation 20.4k Dec 30, 2022
Natural Intelligence is still a pretty good idea.

Human Learn Machine Learning models should play by the rules, literally. Project Goal Back in the old days, it was common to write rule-based systems.

vincent d warmerdam 641 Dec 26, 2022
Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

1 Jan 06, 2022
A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

DeepKE is a knowledge extraction toolkit supporting low-resource and document-level scenarios for entity, relation and attribute extraction. We provide comprehensive documents, Google Colab tutorials

ZJUNLP 1.6k Jan 05, 2023
Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

AmbientGAN: Generative models from lossy measurements This repository provides code to reproduce results from the paper AmbientGAN: Generative models

Ashish Bora 87 Oct 19, 2022
A curated list of neural network pruning resources.

A curated list of neural network pruning and related resources. Inspired by awesome-deep-vision, awesome-adversarial-machine-learning, awesome-deep-learning-papers and Awesome-NAS.

Yang He 1.7k Jan 09, 2023