A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

Last update: Aug 18, 2022

Overview

MADGRAD Optimization Algorithm For Tensorflow

This package implements the MadGrad Algorithm proposed in Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization (Aaron Defazio and Samy Jelassi, 2021).

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Usage
Contributing
License
Contact
Citations

About The Project

The MadGrad algorithm of optimization uses Dual averaging of gradients along with momentum based adaptivity to attain results that match or outperform Adam or SGD + momentum based algorithms. This project offers a Tensorflow implementation of the algorithm along with a few usage examples and tests.

Prerequisites

Prerequisites can be installed separately through the requirements.txt file as below

pip install -r requirements.txt

Installation

This project is built with Python 3 and can be pip installed directly

pip install tf-madgrad

Usage

To use the optimizer in any tf.keras model, you just need to import and instantiate the MadGrad optimizer from the tf_madgrad package.

from madgrad import MadGrad

# Create the architecture
inp = tf.keras.layers.Input(shape=shape)
...
op = tf.keras.layers.Dense(classes, activation=activation)

# Instantiate the model
model = tf.keras.models.Model(inp, op)

# Pass the MadGrad optimizer to the compile function
model.compile(optimizer=MadGrad(lr=0.01), loss=loss)

# Fit the keras model as normal
model.fit(...)

This implementation is also supported for distributed training using tf.strategy

See a MNIST example here

Contributing

Any and all contributions are welcome. Please raise an issue if the optimizer gives incorrect results or crashes unexpectedly during training.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Feel free to reach out for any issues or requests related to this implementation

Darshan Deshpande - Email | LinkedIn

Citations

@misc{defazio2021adaptivity,
      title={Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization}, 
      author={Aaron Defazio and Samy Jelassi},
      year={2021},
      eprint={2101.11075},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

Related tags

Overview

MADGRAD Optimization Algorithm For Tensorflow

About The Project

Prerequisites

Installation

Usage

Contributing

License

Contact

Citations

Owner

Genshin-assets - 👧 Public documentation & static assets for Genshin Impact data.

DGL-TreeSearch and the Gurobi-MWIS interface

机器学习、深度学习、自然语言处理等人工智能基础知识总结。

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

Continuous Diffusion Graph Neural Network

Face Alignment using python

Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

Colab notebook for openai/glide-text2im.

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Time series annotation library.

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Companion code for the paper "Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks" by Yatsura et al.

WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction"

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Tilted Empirical Risk Minimization (ICLR '21)

A Python reference implementation of the CF data model

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

Bringing Characters to Life with Computer Brains in Unity