The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Overview

[ICLR 2022] The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training
Shiwei Liu, Tianlong Chen, Xiaohan Chen, Li Shen, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy

https://openreview.net/forum?id=VBZJ_3tz-t

Abstract: Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. In this paper, we focus on sparse training and highlight a perhaps counter-intuitive finding, that random pruning at initialization (PaI) can be quite powerful for the sparse training of modern neural networks. Without any delicate pruning criteria or carefully pursued sparsity structures, we empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. There are two key factors that contribute to this revival: (i) the network sizes matter: as the original dense networks grow wider and deeper, the performance of training a randomly pruned sparse network will quickly grow to matching that of its dense equivalent, even at high sparsity ratios; (ii) appropriate layer-wise sparsity ratios can be pre-chosen for sparse training, which shows to be another important performance booster. Simple as it looks, a randomly pruned subnetwork of Wide ResNet-50 can be sparsely trained to match the accuracy of a dense Wide ResNet-50, on ImageNet. We also observed such randomly pruned networks outperform dense counterparts in other favorable aspects, such as out-of-distribution detection, uncertainty estimation, and adversarial robustness. Overall, our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.

This code base is created by Shiwei Liu [email protected] during his Ph.D. at Eindhoven University of Technology.

Requirements

Python 3.6, PyTorch v1.5.1, and CUDA v10.2.

How to Run Experiments

[Training module] The training module is controlled by the following arguments:

  • --sparse - Enable sparse mode (remove this if want to train dense model)
  • --fix - Fix the sparse pattern during training (remove this if want to with dynamic sparse training)
  • --sparse-init - Type of sparse initialization. Choose from: uniform, uniform_plus, ERK, ERK_plus, ER, snip (snip ratio), GraSP (GraSP ratio)
  • --model (str) - cifar_resnet_A_B, where A is the depths and B is the width, e.g., cifar_resnet_20_32
  • --density (float) - density level (default 0.05)

CIFAR-10/100 Experiments

To train ResNet with various depths on CIFAR10/100:

for model in cifar_resnet_20 cifar_resnet_32 cifar_resnet_44 cifar_resnet_56 cifar_resnet_110 
do
    python main.py --sparse --seed 17 --sparse_init ERK --fix --lr 0.1 --density 0.05 --model $model --data cifar10 --epoch 160
done

To train ResNet with various depths on CIFAR10/100:

for model in cifar_resnet_20_8 cifar_resnet_20_16 cifar_resnet_20_24 
do
    python main.py --sparse --seed 17 --sparse_init ERK --fix --lr 0.1 --density 0.05 --model $model --data cifar10 --epoch 160
done

ImageNet Experiments

To train WideResNet50_2 on ImageNet with ERK_plus:

cd ImageNet
python $1multiproc.py --nproc_per_node 4 $1main.py --sparse_init ERK_plus --fc_density 1.0 --fix --fp16 --master_port 5556 -j 10 -p 500 --arch WideResNet50_2 -c fanin --label-smoothing 0.1 -b 192 --lr 0.4 --warmup 5 --epochs 100 --density 0.2 --static-loss-scale 256 $2 ../../../../../../data1/datasets/imagenet2012/ --save save/

Citation

if you find this repo is helpful, please cite

@inproceedings{
liu2022the,
title={The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training},
author={Shiwei Liu and Tianlong Chen and Xiaohan Chen and Li Shen and Decebal Constantin Mocanu and Zhangyang Wang and Mykola Pechenizkiy},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=VBZJ_3tz-t}
}
Owner
VITA
Visual Informatics Group @ University of Texas at Austin
VITA
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

CorDA Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation Prerequisite Please create and activate the follo

Qin Wang 60 Nov 30, 2022
Age Progression/Regression by Conditional Adversarial Autoencoder

Age Progression/Regression by Conditional Adversarial Autoencoder (CAAE) TensorFlow implementation of the algorithm in the paper Age Progression/Regre

Zhifei Zhang 603 Dec 22, 2022
StyleGAN - Official TensorFlow Implementation

StyleGAN — Official TensorFlow Implementation Picture: These people are not real – they were produced by our generator that allows control over differ

NVIDIA Research Projects 13.1k Jan 09, 2023
DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

DFL-Colab — DeepFaceLab fork for Google Colab This project provides you IPython Notebook to use DeepFaceLab with Google Colaboratory. You can create y

779 Jan 05, 2023
DeepLab resnet v2 model in pytorch

pytorch-deeplab-resnet DeepLab resnet v2 model implementation in pytorch. The architecture of deepLab-ResNet has been replicated exactly as it is from

Isht Dwivedi 601 Dec 22, 2022
This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

348 Jan 07, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL and PFRL ChainerRL (this repository) is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement al

Chainer 1.1k Jan 01, 2023
Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

RHGN Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling Dependencies torch==1.6.0 torchvision==0.7.0 dgl==0.7.1

Big Data and Multi-modal Computing Group, CRIPAC 6 Nov 29, 2022
Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Contra-OOD Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers. Requirements PyTorch Transformers datasets

Wenxuan Zhou 27 Oct 28, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
Contains source code for the winning solution of the xView3 challenge

Winning Solution for xView3 Challenge This repository contains source code and pretrained models for my (Eugene Khvedchenya) solution to xView 3 Chall

Eugene Khvedchenya 51 Dec 30, 2022
Artificial Neural network regression model to predict the energy output in a combined cycle power plant.

Energy_Output_Predictor Artificial Neural network regression model to predict the energy output in a combined cycle power plant. Abstract Energy outpu

1 Feb 11, 2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN-v2 StackGAN-v1: Tensorflow implementation StackGAN-v1: Pytorch implementation Inception score evaluation Pytorch implementation for reproduci

Han Zhang 809 Dec 16, 2022
LBK 35 Dec 26, 2022
GNEE - GAT Neural Event Embeddings

GNEE - GAT Neural Event Embeddings This repository contains source code for the GNEE (GAT Neural Event Embeddings) method introduced in the paper: "Se

João Pedro Rodrigues Mattos 0 Sep 15, 2021
The authors' official PyTorch SigWGAN implementation

The authors' official PyTorch SigWGAN implementation This repository is the official implementation of [Sig-Wasserstein GANs for Time Series Generatio

9 Jun 16, 2022
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

ZhouYanzhao 217 Dec 12, 2022
Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

Shravan Anand K 5 Mar 21, 2022