Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Last update: Oct 29, 2022

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

Setup the environment

To install the required Python modules:

conda create --name py38_oktopk python=3.8

conda activate py38_oktopk

pip3 install pip==20.2.4

pip install -r requirements.txt

MPICC="cc -shared" pip install --no-binary=mpi4py mpi4py

git clone https://github.com/NVIDIA/apex

cd apex

pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Prepare Datasets

Cifar-10 for VGG

cd ./VGG/vgg_data

wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

tar -zxvf cifar-10-python.tar.gz

AN4 for LSTM

cd ./LSTM/audio_data

wget https://www.dropbox.com/s/l5w4up20u5pfjxf/an4.zip

unzip an4.zip

Wikipedia for BERT

cd ./BERT/bert/bert_data/

Prepare the dataset according to the README file.

Run jobs

We run experiments on GPU clusters with SLURM job scheduler. To evaluate the performance of Ok-Topk, Gaussiank, gtopk, topkA, topkDSA, and dense, run the jobs as follows.

To run VGG jobs

cd ./VGG

./sbatch_vgg_jobs.sh

To run LSTM jobs

cd ./LSTM

./sbatch_lstm_jobs.sh

To run BERT jobs

cd ./BERT/bert/

./sbatch_bert_jobs.sh

Publication

The work of Ok-Topk is pulished in PPoPP'22. DOI

License

See LICENSE.

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Setup the environment

Prepare Datasets

Cifar-10 for VGG

AN4 for LSTM

Wikipedia for BERT

Run jobs

To run VGG jobs

To run LSTM jobs

To run BERT jobs

Publication

License

Owner

Shigang Li

PRTR: Pose Recognition with Cascade Transformers

Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)

Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Segmentation Training Pipeline

Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map

"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

Utility code for use with PyXLL

Python implementation of "Single Image Haze Removal Using Dark Channel Prior"

Algorithmic Trading using RNN

Face Library is an open source package for accurate and real-time face detection and recognition

Official repository for ABC-GAN

Tensorflow implementation for "Improved Transformer for High-Resolution GANs" (NeurIPS 2021).

PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

Multispectral Object Detection with Yolov5

Diverse Object-Scene Compositions For Zero-Shot Action Recognition