Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

Overview

Reducing Underflow in Mixed Precision Training by Gradient Scaling

Python Package using Conda Code style: black codecov Total alerts Language grade: Python

This project implements the gradient scaling method to improve the performance of mixed precision training.

The old repository: https://github.com/ada-loss/ada-loss

@inproceedings{ijcai2020-404,
  title     = {Reducing Underflow in Mixed Precision Training by Gradient Scaling},
  author    = {Zhao, Ruizhe and Vogel, Brian and Ahmed, Tanvir and Luk, Wayne},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  editor    = {Christian Bessiere}	
  pages     = {2922--2928},
  year      = {2020},
  month     = {7},
  note      = {Main track}
  doi       = {10.24963/ijcai.2020/404},
  url       = {https://doi.org/10.24963/ijcai.2020/404},
}

Introduction

Loss scaling is a technique that scales up loss values to mitigate underflow caused by low precision data representation in backpropagated activation gradients. The original implementation uses a fixed loss scale value predetermined before training starts for all layers, which may not be optimal since the statistics of gradients change across layers and training epochs. Instead, our method calculates the loss scale value for each layer based on their runtime statistics.

Installation

We are using Anaconda to manage package dependencies:

conda create -f environment.yml
conda activate ada_loss

To install this project, please consider using this command:

pip install -e . # in the project root

Project structure

The structure of this project is as follows: the core of the adaptive loss scaling method is implemented in the ada_loss package; chainerlp provides the implementation of some baseline models; and models includes third party implementation of more complicated baseline models.

Usage

Example usage for chainer (other frameworks will be released later):

from ada_loss.chainer import AdaLossScaled
from ada_loss.chainer import transforms

# transform your link to support adaptive loss scaling
link = AdaLossScaled(link, transforms=[
    transforms.AdaLossTransformLinear(),
    transforms.AdaLossTransformConvolution2D(),
    # ...
])

It tries to convert links within the given link to ones that supports adaptive loss scaling based on the provided list of transforms. Adaptive loss scaled links are located under ada_loss.chainer.links. Transforms are extended based on AdaLossTransform in ada_loss.chainer.transforms.base and stored under ada_loss.chainer.transforms. For now, users are required to go through their link and specify explicitly transforms that should be taken.

Examples

Examples are located here.

Testing

Tests can be launched by calling pytest. Some tests are specified to be run on GPUs.

Owner
Ruizhe Zhao
Linking fire @ICComputing
Ruizhe Zhao
FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

FaceExtraction FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction Occlusions often occur in face images in the wild, tr

16 Dec 14, 2022
Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

Patrick Varilly 28 Nov 25, 2022
This repo implements a 3D segmentation task for an airport baggage dataset.

3D CT Scan Segmentation With Occupancy Network This repo implements a 3D superresolution segmentation task for an airport baggage dataset. Our final p

Christoph Reich 2 Mar 28, 2022
Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

SinGAN This is an unofficial implementation of SinGAN from someone who's been sitting right next to SinGAN's creator for almost five years. Please ref

35 Nov 10, 2022
FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FaceQgen FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment This repository is based on the paper: "FaceQgen: Semi-Supervised D

Javier Hernandez-Ortega 3 Aug 04, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022
A Python reference implementation of the CF data model

cfdm A Python reference implementation of the CF data model. References Compliance with FAIR principles Documentation https://ncas-cms.github.io/cfdm

NCAS CMS 25 Dec 13, 2022
Think Big, Teach Small: Do Language Models Distil Occam’s Razor?

Think Big, Teach Small: Do Language Models Distil Occam’s Razor? Software related to the paper "Think Big, Teach Small: Do Language Models Distil Occa

0 Dec 07, 2021
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Training DiffWave using variational method from Variational Diffusion Models.

Variational DiffWave Training DiffWave using variational method from Variational Diffusion Models. Quick Start python train_distributed.py discrete_10

Chin-Yun Yu 26 Dec 13, 2022
RTSeg: Real-time Semantic Segmentation Comparative Study

Real-time Semantic Segmentation Comparative Study The repository contains the official TensorFlow code used in our papers: RTSEG: REAL-TIME SEMANTIC S

Mennatullah Siam 592 Nov 18, 2022
Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

SSSNET SSSNET: Semi-Supervised Signed Network Clustering For details, please read our paper. Environment Setup Overview The project has been tested on

Yixuan He 9 Nov 24, 2022
hipCaffe: the HIP port of Caffe

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Cent

ROCm Software Platform 126 Dec 05, 2022
[Link]deep_portfolo - Use Reforcemet earg ad Supervsed learg to Optmze portfolo allocato []

rl_portfolio This Repository uses Reinforcement Learning and Supervised learning to Optimize portfolio allocation. The goal is to make profitable agen

Deepender Singla 165 Dec 02, 2022
git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

233 Dec 29, 2022
SASM - simple crossplatform IDE for NASM, MASM, GAS and FASM assembly languages

SASM (SimpleASM) - простая кроссплатформенная среда разработки для языков ассемблера NASM, MASM, GAS, FASM с подсветкой синтаксиса и отладчиком. В SA

Dmitriy Manushin 5.6k Jan 06, 2023
PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

deepGCFX PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning" Pr

Thilini Cooray 4 Aug 11, 2022
CSPML (crystal structure prediction with machine learning-based element substitution)

CSPML (crystal structure prediction with machine learning-based element substitution) CSPML is a unique methodology for the crystal structure predicti

8 Dec 20, 2022
PyTorch implementations of the beta divergence loss.

Beta Divergence Loss - PyTorch Implementation This repository contains code for a PyTorch implementation of the beta divergence loss. Dependencies Thi

Billy Carson 7 Nov 09, 2022