Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Last update: Nov 23, 2021

Related tags

Deep Learning Low_Precision_DL

Overview

Low Precision Decentralized Training with Heterogenous Data

Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

[Paper]

Abstract

Decentralized distributed learning is the key to enabling large-scale machine learning (training) on the edge devices utilizing private user-generated local data, without relying on the cloud. However, practical realization of such on-device training is limited by the communication bottleneck, computation complexity of training deep models and significant data distribution skew across devices. Many feedback-based compression techniques have been proposed in the literature to reduce the communication cost and a few works propose algorithmic changes to aid the performance in the presence of skewed data distribution by improving convergence rate. To the best of our knowledge, there is no work in the literature that applies and shows compute efficient training techniques such quantization, pruning etc., for peer-to-peer decentralized learning setups. In this paper, we analyze and show the convergence of low precision decentralized training that aims to reduce computational complexity of training and inference. Further, We study the effect of degree of skew and communication compression on the low precision decentralized training over various computer vision and Natural Language Processing (NLP) tasks. Our experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with heterogeneous data. However, when low precision training is accompanied by communication compression through sparsification we observe 1-2% drop in accuracy. The proposed low precision decentralized training decreases computational complexity, memory usage, and communication cost by ~4x while trading off less than a 1% accuracy for both IID and non-IID data. In particular, with higher skew values, we observe an increase in accuracy (by ~0.5%) with low precision training, indicating the regularization effect of the quantization.

Experiments

This repository currently contains experiments reported in the paper for Low precision CHOCO-SGD and Deep-Squeeze.

Datasets

CIFAR-10
CIFAR-100
Imagenette

Models

ResNet
VGG
MobileNet

sh run.sh

References

This code uses the Facebook's Stochastic Gradient Push Repository for building up the decentralized learning setup. We update the code base to include Deep-Squeeze, CHOCO-SGD, Quasi-Gobal Momentum and 8-bit integer training.

Citation

@inproceedings{
aketi2021,
title={Low Precision Decentralized Distributed Training with Heterogenous Data},
author={Sai Aparna Aketi, Sangamesh Kodge, and Kaushik Roy},
booktitle={arXiv pre-print},
year={2021},
url={https://arxiv.org/abs/2111.09389}
}

Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Related tags

Overview

Low Precision Decentralized Training with Heterogenous Data

Abstract

Experiments

Datasets

Models

References

Citation

Owner

Aparna Aketi

An Unbiased Learning To Rank Algorithms (ULTRA) toolbox

Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Implementation for Curriculum DeepSDF

Deep Learning GPU Training System

FasterAI: A library to make smaller and faster models with FastAI.

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

Deep Distributed Control of Port-Hamiltonian Systems

Multi-Horizon-Forecasting-for-Limit-Order-Books

This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

Visual dialog agents with pre-trained vision-and-language encoders.

TriMap: Large-scale Dimensionality Reduction Using Triplets

yolov5 deepsort 行人车辆跟踪检测计数

PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations.

A highly efficient and modular implementation of Gaussian Processes in PyTorch

Python calculations for the position of the sun and moon.

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Related tags

Overview

Low Precision Decentralized Training with Heterogenous Data

Abstract

Experiments

Datasets

Models

References

Citation

Owner

Aparna Aketi

An Unbiased Learning To Rank Algorithms (ULTRA) toolbox

Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Implementation for Curriculum DeepSDF

Deep Learning GPU Training System

FasterAI: A library to make smaller and faster models with FastAI.

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

Deep Distributed Control of Port-Hamiltonian Systems

Multi-Horizon-Forecasting-for-Limit-Order-Books

This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

Visual dialog agents with pre-trained vision-and-language encoders.

TriMap: Large-scale Dimensionality Reduction Using Triplets

yolov5 deepsort 行人 车辆 跟踪 检测 计数

PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations.

A highly efficient and modular implementation of Gaussian Processes in PyTorch

Python calculations for the position of the sun and moon.

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

yolov5 deepsort 行人车辆跟踪检测计数