SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Last update: Dec 16, 2022

Related tags

Overview

SLIDE

The SLIDE package contains the source code for reproducing the main experiments in this paper.

Dataset

The Datasets can be downloaded in Amazon-670K. Note that the data is sorted by labels so please shuffle at least the validation/testing data.

TensorFlow Baselines

We suggest directly get TensorFlow docker image to install TensorFlow-GPU. For TensorFlow-CPU compiled with AVX2, we recommend using this precompiled build.

Also there is a TensorFlow docker image specifically built for CPUs with AVX-512 instructions, to get it use:

docker pull clearlinux/stacks-dlrs_2-mkl

config.py controls the parameters of TensorFlow training like learning rate. example_full_softmax.py, example_sampled_softmax.py are example files for Amazon-670K dataset with full softmax and sampled softmax respectively.

Build/Run on Intel platform

Prerequisites:

CMake >= 3.0 Intel Compiler (ICC) >= 19

Build with ICC compiler

source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh -arch intel64 -platform linux
cd /path/to/slide-root
mkdir -p bin && cd bin 
# BDW (AVX2)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc
# SKX/CLX (AVX512)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1
# CPX (AVX512 + BF16)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1 -DOPT_AVX512_BF16=1
make -j

Run on Intel SKX/CLX/CPX

cd bin
OMP_NUM_THREADS= KMP_HW_SUBSET=s,c,t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv
For example, on CLX8280 2Sx28c:
OMP_NUM_THREADS=112 KMP_HW_SUBSET=2s,28c,2t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv

For best performance please set Batchsize=multiple-of-logic-core-number from SLIDE/Config_amz.csv.

Results can be checked from the log file under dataset:

tail -f dataset/log.txt

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Related tags

Overview

SLIDE

Dataset

TensorFlow Baselines

Build/Run on Intel platform

Prerequisites:

Build with ICC compiler

Run on Intel SKX/CLX/CPX

Owner

Intel Labs

Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images.

This repository contains the official MATLAB implementation of the TDA method for reverse image filtering

Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Exadel CompreFace is a free and open-source face recognition GitHub project

The AWS Certified SysOps Administrator

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

An open-access benchmark and toolbox for electricity price forecasting

ComputerVision - This repository aims at realized easy network architecture

Current state of supervised and unsupervised depth completion methods

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

Multi-Task Learning as a Bargaining Game

Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network