SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Last update: Dec 16, 2022

Related tags

Overview

SLIDE

The SLIDE package contains the source code for reproducing the main experiments in this paper.

Dataset

The Datasets can be downloaded in Amazon-670K. Note that the data is sorted by labels so please shuffle at least the validation/testing data.

TensorFlow Baselines

We suggest directly get TensorFlow docker image to install TensorFlow-GPU. For TensorFlow-CPU compiled with AVX2, we recommend using this precompiled build.

Also there is a TensorFlow docker image specifically built for CPUs with AVX-512 instructions, to get it use:

docker pull clearlinux/stacks-dlrs_2-mkl

config.py controls the parameters of TensorFlow training like learning rate. example_full_softmax.py, example_sampled_softmax.py are example files for Amazon-670K dataset with full softmax and sampled softmax respectively.

Build/Run on Intel platform

Prerequisites:

CMake >= 3.0 Intel Compiler (ICC) >= 19

Build with ICC compiler

source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh -arch intel64 -platform linux
cd /path/to/slide-root
mkdir -p bin && cd bin 
# BDW (AVX2)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc
# SKX/CLX (AVX512)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1
# CPX (AVX512 + BF16)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1 -DOPT_AVX512_BF16=1
make -j

Run on Intel SKX/CLX/CPX

cd bin
OMP_NUM_THREADS= KMP_HW_SUBSET=s,c,t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv
For example, on CLX8280 2Sx28c:
OMP_NUM_THREADS=112 KMP_HW_SUBSET=2s,28c,2t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv

For best performance please set Batchsize=multiple-of-logic-core-number from SLIDE/Config_amz.csv.

Results can be checked from the log file under dataset:

tail -f dataset/log.txt

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Related tags

Overview

SLIDE

Dataset

TensorFlow Baselines

Build/Run on Intel platform

Prerequisites:

Build with ICC compiler

Run on Intel SKX/CLX/CPX

Owner

Intel Labs

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Collection of machine learning related notebooks to share.

Build and run Docker containers leveraging NVIDIA GPUs

Python script that allows you to automatically setup your Growtopia server.

[CVPR'21] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

My personal Home Assistant configuration.

MAVE: : A Product Dataset for Multi-source Attribute Value Extraction

Tool for installing and updating MiSTer cores and other files

Using deep learning to predict gene structures of the coding genes in DNA sequences of Arabidopsis thaliana

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

Using PyTorch Perform intent classification using three different models to see which one is better for this task

Efficient Sparse Attacks on Videos using Reinforcement Learning

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation