Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Last update: Oct 29, 2022

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

Setup the environment

To install the required Python modules:

conda create --name py38_oktopk python=3.8

conda activate py38_oktopk

pip3 install pip==20.2.4

pip install -r requirements.txt

MPICC="cc -shared" pip install --no-binary=mpi4py mpi4py

git clone https://github.com/NVIDIA/apex

cd apex

pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Prepare Datasets

Cifar-10 for VGG

cd ./VGG/vgg_data

wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

tar -zxvf cifar-10-python.tar.gz

AN4 for LSTM

cd ./LSTM/audio_data

wget https://www.dropbox.com/s/l5w4up20u5pfjxf/an4.zip

unzip an4.zip

Wikipedia for BERT

cd ./BERT/bert/bert_data/

Prepare the dataset according to the README file.

Run jobs

We run experiments on GPU clusters with SLURM job scheduler. To evaluate the performance of Ok-Topk, Gaussiank, gtopk, topkA, topkDSA, and dense, run the jobs as follows.

To run VGG jobs

cd ./VGG

./sbatch_vgg_jobs.sh

To run LSTM jobs

cd ./LSTM

./sbatch_lstm_jobs.sh

To run BERT jobs

cd ./BERT/bert/

./sbatch_bert_jobs.sh

Publication

The work of Ok-Topk is pulished in PPoPP'22. DOI

License

See LICENSE.

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Setup the environment

Prepare Datasets

Cifar-10 for VGG

AN4 for LSTM

Wikipedia for BERT

Run jobs

To run VGG jobs

To run LSTM jobs

To run BERT jobs

Publication

License

Owner

Shigang Li

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Text-Based Ideal Points

Semi-Autoregressive Transformer for Image Captioning

An official implementation of "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation" (CVPR 2021) in PyTorch.

HHP-Net: A light Heteroscedastic neural network for Head Pose estimation with uncertainty

TANL: Structured Prediction as Translation between Augmented Natural Languages

Python Wrapper for Embree

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Framework to build and train RL algorithms

Symbolic Music Generation with Diffusion Models

High performance distributed framework for training deep learning recommendation models based on PyTorch.

The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation

A set of tools for converting a darknet dataset to COCO format working with YOLOX

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs

Instance-based label smoothing for improving deep neural networks generalization and calibration

This library provides an abstraction to perform Model Versioning using Weight & Biases.