Bunch of optimizer implementations in PyTorch

Last update: Jan 03, 2023

Overview

pytorch-optimizer

Bunch of optimizer implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.

Most of the implementations are based on the original paper, but I added some tweaks.

Highly inspired by pytorch-optimizer.

Documentation

https://pytorch-optimizers.readthedocs.io/en/latest/

Usage

Install

$ pip3 install pytorch-optimizer

Simple Usage

from pytorch_optimizer import Ranger21

...
model = YourModel()
optimizer = Ranger21(model.parameters())
...

for input, output in data:
  optimizer.zero_grad()
  loss = loss_function(output, model(input))
  loss.backward()
  optimizer.step()

Supported Optimizers

Optimizer	Description	Official Code	Paper
AdaBelief	Adapting Stepsizes by the Belief in Observed Gradients	github	https://arxiv.org/abs/2010.07468
AdaBound	Adaptive Gradient Methods with Dynamic Bound of Learning Rate	github	https://openreview.net/forum?id=Bkg3g2R9FX
AdaHessian	An Adaptive Second Order Optimizer for Machine Learning	github	https://arxiv.org/abs/2006.00719
AdamP	Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights	github	https://arxiv.org/abs/2006.08217
diffGrad	An Optimization Method for Convolutional Neural Networks	github	https://arxiv.org/abs/1909.11015v3
MADGRAD	A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic	github	https://arxiv.org/abs/2101.11075
RAdam	On the Variance of the Adaptive Learning Rate and Beyond	github	https://arxiv.org/abs/1908.03265
Ranger	a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer	github	https://bit.ly/3zyspC3
Ranger21	a synergistic deep learning optimizer	github	https://arxiv.org/abs/2106.13731

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping	Gradient Centralization	Softplus Transformation
Gradient Normalization	Norm Loss	Positive-Negative Momentum
Linear learning rate warmup	Stable weight decay	Explore-exploit learning rate schedule
Lookahead	Chebyshev learning rate schedule	(Adaptive) Sharpness-Aware Minimization
On the Convergence of Adam and Beyond

Adaptive Gradient Clipping

This idea originally proposed in NFNet (Normalized-Free Network) paper.

AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

code : github
paper : arXiv

Gradient Centralization

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

code : github
paper : arXiv

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

paper : arXiv

Gradient Normalization

Norm Loss

paper : arXiv

Positive-Negative Momentum

code : github
paper : arXiv

Linear learning rate warmup

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png

paper : arXiv

Stable weight decay

code : github
paper : arXiv

Explore-exploit learning rate schedule

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png

code : github
paper : arXiv

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is

updated and substituted to the current weights every k_{lookahead} steps (5 by default).

code : github
paper : arXiv

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

paper : arXiv

(Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.

In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

SAM paper : paper
ASAM paper : paper
A/SAM code : github

On the Convergence of Adam and Beyond

paper : paper

Citations

AdamP

@inproceedings{heo2021adamp,
    title={AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights},
    author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Kim, Gyuwan and Uh, Youngjung and Ha, Jung-Woo},
    year={2021},
    booktitle={International Conference on Learning Representations (ICLR)},
}

Adaptive Gradient Clipping (AGC)

@article{brock2021high,
  author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
  title={High-Performance Large-Scale Image Recognition Without Normalization},
  journal={arXiv preprint arXiv:2102.06171},
  year={2021}
}

Chebyshev LR Schedules

@article{agarwal2021acceleration,
  title={Acceleration via Fractal Learning Rate Schedules},
  author={Agarwal, Naman and Goel, Surbhi and Zhang, Cyril},
  journal={arXiv preprint arXiv:2103.01338},
  year={2021}
}

Gradient Centralization (GC)

@inproceedings{yong2020gradient,
  title={Gradient centralization: A new optimization technique for deep neural networks},
  author={Yong, Hongwei and Huang, Jianqiang and Hua, Xiansheng and Zhang, Lei},
  booktitle={European Conference on Computer Vision},
  pages={635--652},
  year={2020},
  organization={Springer}
}

Lookahead

@article{zhang2019lookahead,
  title={Lookahead optimizer: k steps forward, 1 step back},
  author={Zhang, Michael R and Lucas, James and Hinton, Geoffrey and Ba, Jimmy},
  journal={arXiv preprint arXiv:1907.08610},
  year={2019}
}

RAdam

@inproceedings{liu2019radam,
 author = {Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
 booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020)},
 month = {April},
 title = {On the Variance of the Adaptive Learning Rate and Beyond},
 year = {2020}
}

Norm Loss

@inproceedings{georgiou2021norm,
  title={Norm Loss: An efficient yet effective regularization method for deep neural networks},
  author={Georgiou, Theodoros and Schmitt, Sebastian and B{\"a}ck, Thomas and Chen, Wei and Lew, Michael},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  pages={8812--8818},
  year={2021},
  organization={IEEE}
}

Positive-Negative Momentum

@article{xie2021positive,
  title={Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author={Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2103.17182},
  year={2021}
}

Explore-Exploit learning rate schedule

@article{iyer2020wide,
  title={Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule},
  author={Iyer, Nikhil and Thejas, V and Kwatra, Nipun and Ramjee, Ramachandran and Sivathanu, Muthian},
  journal={arXiv preprint arXiv:2003.03977},
  year={2020}
}

Linear learning-rate warm-up

@article{ma2019adequacy,
  title={On the adequacy of untuned warmup for adaptive optimization},
  author={Ma, Jerry and Yarats, Denis},
  journal={arXiv preprint arXiv:1910.04209},
  volume={7},
  year={2019}
}

Stable weight decay

@article{xie2020stable,
  title={Stable weight decay regularization},
  author={Xie, Zeke and Sato, Issei and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2011.11152},
  year={2020}
}

Softplus transformation

@article{tong2019calibrating,
  title={Calibrating the adaptive learning rate to improve convergence of adam},
  author={Tong, Qianqian and Liang, Guannan and Bi, Jinbo},
  journal={arXiv preprint arXiv:1908.00700},
  year={2019}
}

MADGRAD

@article{defazio2021adaptivity,
  title={Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization},
  author={Defazio, Aaron and Jelassi, Samy},
  journal={arXiv preprint arXiv:2101.11075},
  year={2021}
}

AdaHessian

@article{yao2020adahessian,
  title={ADAHESSIAN: An adaptive second order optimizer for machine learning},
  author={Yao, Zhewei and Gholami, Amir and Shen, Sheng and Mustafa, Mustafa and Keutzer, Kurt and Mahoney, Michael W},
  journal={arXiv preprint arXiv:2006.00719},
  year={2020}
}

AdaBound

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

AdaBelief

@article{zhuang2020adabelief,
  title={Adabelief optimizer: Adapting stepsizes by the belief in observed gradients},
  author={Zhuang, Juntang and Tang, Tommy and Ding, Yifan and Tatikonda, Sekhar and Dvornek, Nicha and Papademetris, Xenophon and Duncan, James S},
  journal={arXiv preprint arXiv:2010.07468},
  year={2020}
}

Sharpness-Aware Minimization

@article{foret2020sharpness,
  title={Sharpness-aware minimization for efficiently improving generalization},
  author={Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam},
  journal={arXiv preprint arXiv:2010.01412},
  year={2020}
}

Adaptive Sharpness-Aware Minimization

@article{kwon2021asam,
  title={ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks},
  author={Kwon, Jungmin and Kim, Jeongseop and Park, Hyunseo and Choi, In Kwon},
  journal={arXiv preprint arXiv:2102.11600},
  year={2021}
}

diffGrad

@article{dubey2019diffgrad,
  title={diffgrad: An optimization method for convolutional neural networks},
  author={Dubey, Shiv Ram and Chakraborty, Soumendu and Roy, Swalpa Kumar and Mukherjee, Snehasis and Singh, Satish Kumar and Chaudhuri, Bidyut Baran},
  journal={IEEE transactions on neural networks and learning systems},
  volume={31},
  number={11},
  pages={4500--4511},
  year={2019},
  publisher={IEEE}
}

On the Convergence of Adam and Beyond

@article{reddi2019convergence,
  title={On the convergence of adam and beyond},
  author={Reddi, Sashank J and Kale, Satyen and Kumar, Sanjiv},
  journal={arXiv preprint arXiv:1904.09237},
  year={2019}
}

Author

Hyeongchan Kim / @kozistr

Comments

Sharpness Aware Minimization (SAM) requires closure

Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure
question

opened by manza-ari 21

Trying to use SAM optimizer for Random Sampling Image Classification

I am trying to use SAM optimizer when I use the backward function twice in train_epoch() # second forward-backward pass, it gives me an error otherwise it works fine.

Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

`def train_epoch(models, criterion, optimizers, dataloaders):

models.train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
    with torch.cuda.device(CUDA_VISIBLE_DEVICES):
        inputs = data[0].cuda()
        labels = data[1].cuda()
    iters += 1
    optimizers.zero_grad()  
    #pdb.set_trace()      
    scores, _, features = models(inputs) 
    
    target_loss = criterion(scores, labels)
    m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
    loss  = m_backbone_loss
     # -----------------SAM Optimizer -------------------
    # first forward-backward pass
    criterion(models(inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers.first_step(zero_grad=True)
    
    # second forward-backward pass
    criterion(models(inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers.second_step(zero_grad=True)
#return loss`

question

opened by manza-ari 14

Ranger21 does not work

Below is the trace when I try to use Ranger21, other optimizers work as they should

c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in init(self, params, lr, beta0, betas, use_softplus, beta_softplus, num_iterations, num_warm_up_iterations, num_warm_down_iterations, warm_down_min_lr, agc_clipping_value, agc_eps, centralize_gradients, normalize_gradients, lookahead_merge_time, lookahead_blending_alpha, weight_decay, norm_loss_factor, eps) 114 # warmup iterations 115 self.num_warm_up_iterations: int = ( --> 116 self.build_warm_up_iterations(num_iterations, betas[1]) 117 if num_warm_up_iterations is None 118 else num_warm_up_iterations

c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in build_warm_up_iterations(total_iterations, beta2, warm_up_pct) 150 def build_warm_up_iterations(total_iterations: int, beta2: float, warm_up_pct: float = 0.22) -> int: 151 warm_up_iterations: int = math.ceil(2.0 / (1.0 - beta2)) # default un-tuned linear warmup --> 152 beta_pct: float = warm_up_iterations / total_iterations 153 if beta_pct > 0.45: 154 return int(warm_up_pct * total_iterations)

TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'
bug

opened by BaconGabe 3
[Feature] support torch.hub.load
Problem (Why?)

loading optimizers via torch.hub.load

Solution (What/How?)

it is inconvenient to define functions one by one, I used a trick using globals().

Other changes (bug fixes, small refactors)

Change Callable to Type[Optimizer]. Perhaps this is what you intended. see: typing.Type

Notes

example colab

import torch Adan = torch.hub.load("Bing-su/pytorch_optimizer:hubconf", "Adan")
feature size/S
opened by Bing-su 2
[Test] Increase the test coverage
Problem (Why?)

heading to coverage 98%

Solution (What/How?)

[x] update test_no_gradient

Other changes (bug fixes, small refactors)

[x] fix API documentation

Notes

nope
documentation enhancement size/XL
opened by kozistr 1
[Refactor/Docs] Organize Class docstring & Add custom exceptions
Problem (Why?)

there's no proper exception class (e.g. no sparse gradient, zero parameter size)

Solution (What/How?)

[x] register custom exceptions

[x] refactor the docstrings

[x] support gradient centralization for Adai optimizer

[x] support AdamD debias for AdaPNM optimizer

[x] fix SAM optimizer

[x] add API documentation

Other changes (bug fixes, small refactors)

[x] wrapper to the module (not __init__) in hubconf.py

[x] add a citation to README.rst

Notes

to v2.1.1
bug documentation enhancement refactoring size/XXL
opened by kozistr 1
[Feature] Implement `Adai` optimizer
Problem (Why?)

Implement Adai optimizer

Solution (What/How?)

[x] implement Adai & Adai v2 optimizers

Other changes (bug fixes, small refactors)

nope

Notes

version to v2.1.0
feature size/L
opened by kozistr 1
[CI] Reduce `num_iterations` to speed up the testing
Problem (Why?)

num_iterations to train the model is kinda enough while the testing takes about 2 mins.

Solution (What/How?)

[x] reduce num_iterations to 100 ~ 200, which is enough to train the model (it takes 2 mins -> 1 min)

Other changes (bug fixes, small refactors)

[x] explicit torch package to CPU version

Notes

nope
enhancement size/L
opened by kozistr 1
[CI] Add `pytest-testmon` to reduce testing time
Problem (Why?)

only to run needed tests, not the whole cases.

Solution (What/How?)

[x] add pytest-testmon

Other changes (bug fixes, small refactors)

nope

Notes

nope
dependencies size/M
opened by kozistr 1
[Build] Upgrade Python version to 3.11 for CI/CD pipeline
Problem (Why?)

just upgrading

Solution (What/How?)

[x] Python version to 3.11 for CI/CD pipeline

[x] github action

[x] codecov/codecov-action to v3

[x] actions/setup-python to v4

[x] remove CUDA-related packages from the dependencies manually

[x] upgrade dev dependencies

[x] replace lint.py with pylint built-in option (fail-under)

[x] update .pylintrc

Other changes (bug fixes, small refactors)

nope

Notes

nope
dependencies size/L
opened by kozistr 1
[Build] Bump setuptools from 65.5.0 to 65.5.1
Bumps setuptools from 65.5.0 to 65.5.1.

Changelog

Sourced from setuptools's changelog.

v65.5.1

Misc ^^^^

#3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok

#3659: Fixed REDoS vector in package_index.

Commits

a462cb5 Bump version: 65.5.0 → 65.5.1

de35d8b Merge pull request #3656 from bmorris3/typos

58e23de Update changelog. Ref #3659.

43a9c9b Limit the amount of whitespace to search/backtrack. Fixes #3659.

5791343 Add test capturing failed expectation. Ref #3659.

1f97905 ⚫ Fade to black.

6254567 Remove workaround for emacs.

729b180 ⚫ Fade to black.

c068081 Typo corrections

f777a40 Suppress deprecation warning in --rsyncdir. Workaround for #3655.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies size/XS
opened by dependabot[bot] 1
Versions of codes that work with half precision models

Hi I just discovered your repo and I would like to try it to fine-tune my ParlAI blenderbot2 (see https://github.com/facebookresearch/ParlAI) model. However, I am running the model in FP16 precision to make better use of my GPU. ParlAI has versions of a few optimizers that can use FP16 models, and I have tried installing a couple of other optimizers that can also work with FP16 models by casting the state parameters and gradients to FP32 within the optimizer, determining the new state parameters with FP32 accuracy, and recasting the state parameters back to FP16 for updating the model. If you had a version of your library that automatically did this, it would greatly simplify its use with FP16 precision models. Thanks!

P.S. It looks like adabelief, radam, and diffrgrad do something like this, but not in a consistent way.
feature request

opened by sjscotti 1

Releases(v2.1.1)

v2.1.1(Jan 2, 2023)
Change Log

#90

Feature

Support gradient centralization for Adai optimizer

Support AdamD debias for AdaPNM optimizer

Register custom exceptions (e.g. NoSparseGradientError, NoClosureError, ...)

Documentation

Add API documentation

Bug

Fix SAM optimizer

Source code(tar.gz)
Source code(zip)
v2.1.0(Jan 1, 2023)
Change Log

Implement Adai optimizer, #89

Speed up the testing, #88

Upgrade to Python 3.11 (only for CI pipeline), #86

Source code(tar.gz)
Source code(zip)
v2.0.1(Nov 1, 2022)
Change Log

[Fix] update hubconf.py, #81

[Fix] python 3.7 for Colab environment, #83 (#82)

Source code(tar.gz)
Source code(zip)
v2.0.0(Oct 21, 2022)
Chage Log

[x] Refactor the package depth

4 depths

pytorch_optimizer.lr_scheduler : lr schedulers

pytorch_optimizer.optimizer : optimizers

pytorch_optimizer.base : base utils

pytorch_optimizer.experimental : any experimental features

pytorch_optimizer.adamp -> pytorch_optimizer.optimizer.adamp

Still from pytorch_optimizer import AdamP is possible

[x] Implement lr schedulers

[x] CosineAnealingWarmupRestarts

[x] Implement (experimental) lr schedulers

[x] DeBERTaV3-large layer-wise lr scheduler

Other changes (bug fixes, small refactors)

[x] Fix AGC (to returning the parameter)

[x] Make a room for experimental features (at pytorch_optimizer.experimental)

[x] base types

Source code(tar.gz)
Source code(zip)
v1.3.2(Sep 2, 2022)
Change Log

torch.hub usage in docs, #76

Adan optimizer, #77

fix: forgot to divide into beta_correction

feat: support weight_decouple

Source code(tar.gz)
Source code(zip)
v1.3.1(Sep 1, 2022)
Change Log

[x] raw directive in RST format cannot be used due to the security issue e.g. code injection. #75

Source code(tar.gz)
Source code(zip)
v1.3.0(Sep 1, 2022)
Change Log

[x] Support torch.hub.load to load the pytorch_optimizer, #73

Contributions

thanks to

@Bing-su
Source code(tar.gz)
Source code(zip)
v1.2.0(Aug 26, 2022)
Change Log

Add a new optimizer, Adan optimizer. #69

Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Source code(tar.gz)
Source code(zip)
v1.1.4(Aug 25, 2022)
Just minor changes on the dependencies.

Change Log

required torch version to >=1.10 (just for CI/CD), PR68

still supports all versions of the torch (1.x).

Source code(tar.gz)
Source code(zip)
v1.1.3(Aug 23, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.2(Jun 1, 2022)
Change Log

Fix Ranger21 parameters, not to assign a default value None to num_iterations parameter. Issue

Source code(tar.gz)
Source code(zip)
v1.1.1(May 9, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0(May 8, 2022)

Source code(tar.gz)
Source code(zip)
v1.0.0(May 7, 2022)

Source code(tar.gz)
Source code(zip)
v0.6.1(May 7, 2022)

Source code(tar.gz)
Source code(zip)
v0.6.0(Apr 2, 2022)

Source code(tar.gz)
Source code(zip)
v0.5.0(Mar 5, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.2(Mar 5, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.1(Feb 20, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.0(Feb 19, 2022)
remove .data

torch.no_grad() to the all step() function (except closure())

support bfloat16 dtype (XLA compatibility)

Source code(tar.gz)
Source code(zip)
v0.3.7(Feb 1, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.6(Jan 31, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.5(Jan 30, 2022)
[x] Refactor test modules

merge fp32 & fp16 recipes into one list

[x] test cases for Lookahead

[x] test case for no gradient

[x] fix Ranger21 optimizer able to handle no gradient

[x] improve stability pre_norm for Lamb optimizer

Source code(tar.gz)
Source code(zip)
v0.3.4(Jan 29, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.3(Jan 29, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.2(Jan 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.1(Jan 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.0(Jan 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.2(Nov 29, 2021)

Source code(tar.gz)
Source code(zip)
v0.2.1(Nov 22, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Hyeongchan Kim

Machine Learning Researcher

GitHub Repository https://pytorch-optimizers.readthedocs.io/en/latest/

PyTorch to TensorFlow Lite converter

140 Dec 13, 2022

Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

8.7k Dec 31, 2022

TorchSSL: A PyTorch-based Toolbox for Semi-Supervised Learning

1k Dec 28, 2022

A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

🤗 Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

3.5k Jan 08, 2023

A PyTorch implementation of L-BFGS.

PyTorch-LBFGS: A PyTorch Implementation of L-BFGS Authors: Hao-Jun Michael Shi (Northwestern University) and Dheevatsa Mudigere (Facebook) What is it?

478 Dec 27, 2022

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

Lambda Networks - Pytorch Implementation of λ Networks, a new approach to image recognition that reaches SOTA on ImageNet. The new method utilizes λ l

1.5k Jan 07, 2023

Riemannian Adaptive Optimization Methods with pytorch optim

geoopt Manifold aware pytorch.optim. Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more. Installation Make sur

642 Jan 03, 2023

Official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis.

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis This repo contains the official implementations of EigenDamage: Structured Prunin

107 Apr 20, 2022

Tutorial for surrogate gradient learning in spiking neural networks

SpyTorch A tutorial on surrogate gradient learning in spiking neural networks Version: 0.4 This repository contains tutorial files to get you started

203 Nov 28, 2022

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural

3.5k Jan 08, 2023

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

836 Dec 26, 2022

An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

2.9k Dec 27, 2022

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

1.5k Jan 03, 2023

Bunch of optimizer implementations in PyTorch

Related tags

Overview

pytorch-optimizer

Documentation

Usage

Install

Simple Usage

Supported Optimizers

Useful Resources

Adaptive Gradient Clipping

Gradient Centralization

Softplus Transformation

Gradient Normalization

Norm Loss

Positive-Negative Momentum

Linear learning rate warmup

Stable weight decay

Explore-exploit learning rate schedule

Lookahead

Chebyshev learning rate schedule

(Adaptive) Sharpness-Aware Minimization

On the Convergence of Adam and Beyond

Citations

Author

Comments

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

v65.5.1

Releases(v2.1.1)

v2.1.1(Jan 2, 2023)

Change Log

Feature

Documentation

Bug

v2.1.0(Jan 1, 2023)

Change Log

v2.0.1(Nov 1, 2022)

Change Log

v2.0.0(Oct 21, 2022)

Chage Log

Other changes (bug fixes, small refactors)

v1.3.2(Sep 2, 2022)

Change Log

v1.3.1(Sep 1, 2022)

Change Log

v1.3.0(Sep 1, 2022)

Change Log

Contributions

v1.2.0(Aug 26, 2022)

Change Log

v1.1.4(Aug 25, 2022)

Change Log

v1.1.3(Aug 23, 2022)