Bunch of optimizer implementations in PyTorch

Overview

pytorch-optimizer

Bunch of optimizer implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.
Most of the implementations are based on the original paper, but I added some tweaks.
Highly inspired by pytorch-optimizer.

Documentation

https://pytorch-optimizers.readthedocs.io/en/latest/

Usage

Install

$ pip3 install pytorch-optimizer

Simple Usage

from pytorch_optimizer import Ranger21

...
model = YourModel()
optimizer = Ranger21(model.parameters())
...

for input, output in data:
  optimizer.zero_grad()
  loss = loss_function(output, model(input))
  loss.backward()
  optimizer.step()

Supported Optimizers

Optimizer Description Official Code Paper
AdaBelief Adapting Stepsizes by the Belief in Observed Gradients github https://arxiv.org/abs/2010.07468
AdaBound Adaptive Gradient Methods with Dynamic Bound of Learning Rate github https://openreview.net/forum?id=Bkg3g2R9FX
AdaHessian An Adaptive Second Order Optimizer for Machine Learning github https://arxiv.org/abs/2006.00719
AdamP Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights github https://arxiv.org/abs/2006.08217
diffGrad An Optimization Method for Convolutional Neural Networks github https://arxiv.org/abs/1909.11015v3
MADGRAD A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic github https://arxiv.org/abs/2101.11075
RAdam On the Variance of the Adaptive Learning Rate and Beyond github https://arxiv.org/abs/1908.03265
Ranger a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer github https://bit.ly/3zyspC3
Ranger21 a synergistic deep learning optimizer github https://arxiv.org/abs/2106.13731

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping Gradient Centralization Softplus Transformation
Gradient Normalization Norm Loss Positive-Negative Momentum
Linear learning rate warmup Stable weight decay Explore-exploit learning rate schedule
Lookahead Chebyshev learning rate schedule (Adaptive) Sharpness-Aware Minimization
On the Convergence of Adam and Beyond    

Adaptive Gradient Clipping

This idea originally proposed in NFNet (Normalized-Free Network) paper.
AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

Gradient Centralization

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/gradient_centralization.png

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

Gradient Normalization

Norm Loss

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/norm_loss.png

Positive-Negative Momentum

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png

Linear learning rate warmup

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png

Stable weight decay

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png

Explore-exploit learning rate schedule

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is
updated and substituted to the current weights every k_{lookahead} steps (5 by default).

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

(Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

On the Convergence of Adam and Beyond

Citations

AdamP

@inproceedings{heo2021adamp,
    title={AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights},
    author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Kim, Gyuwan and Uh, Youngjung and Ha, Jung-Woo},
    year={2021},
    booktitle={International Conference on Learning Representations (ICLR)},
}

Adaptive Gradient Clipping (AGC)

@article{brock2021high,
  author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
  title={High-Performance Large-Scale Image Recognition Without Normalization},
  journal={arXiv preprint arXiv:2102.06171},
  year={2021}
}

Chebyshev LR Schedules

@article{agarwal2021acceleration,
  title={Acceleration via Fractal Learning Rate Schedules},
  author={Agarwal, Naman and Goel, Surbhi and Zhang, Cyril},
  journal={arXiv preprint arXiv:2103.01338},
  year={2021}
}

Gradient Centralization (GC)

@inproceedings{yong2020gradient,
  title={Gradient centralization: A new optimization technique for deep neural networks},
  author={Yong, Hongwei and Huang, Jianqiang and Hua, Xiansheng and Zhang, Lei},
  booktitle={European Conference on Computer Vision},
  pages={635--652},
  year={2020},
  organization={Springer}
}

Lookahead

@article{zhang2019lookahead,
  title={Lookahead optimizer: k steps forward, 1 step back},
  author={Zhang, Michael R and Lucas, James and Hinton, Geoffrey and Ba, Jimmy},
  journal={arXiv preprint arXiv:1907.08610},
  year={2019}
}

RAdam

@inproceedings{liu2019radam,
 author = {Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
 booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020)},
 month = {April},
 title = {On the Variance of the Adaptive Learning Rate and Beyond},
 year = {2020}
}

Norm Loss

@inproceedings{georgiou2021norm,
  title={Norm Loss: An efficient yet effective regularization method for deep neural networks},
  author={Georgiou, Theodoros and Schmitt, Sebastian and B{\"a}ck, Thomas and Chen, Wei and Lew, Michael},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  pages={8812--8818},
  year={2021},
  organization={IEEE}
}

Positive-Negative Momentum

@article{xie2021positive,
  title={Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author={Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2103.17182},
  year={2021}
}

Explore-Exploit learning rate schedule

@article{iyer2020wide,
  title={Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule},
  author={Iyer, Nikhil and Thejas, V and Kwatra, Nipun and Ramjee, Ramachandran and Sivathanu, Muthian},
  journal={arXiv preprint arXiv:2003.03977},
  year={2020}
}

Linear learning-rate warm-up

@article{ma2019adequacy,
  title={On the adequacy of untuned warmup for adaptive optimization},
  author={Ma, Jerry and Yarats, Denis},
  journal={arXiv preprint arXiv:1910.04209},
  volume={7},
  year={2019}
}

Stable weight decay

@article{xie2020stable,
  title={Stable weight decay regularization},
  author={Xie, Zeke and Sato, Issei and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2011.11152},
  year={2020}
}

Softplus transformation

@article{tong2019calibrating,
  title={Calibrating the adaptive learning rate to improve convergence of adam},
  author={Tong, Qianqian and Liang, Guannan and Bi, Jinbo},
  journal={arXiv preprint arXiv:1908.00700},
  year={2019}
}

MADGRAD

@article{defazio2021adaptivity,
  title={Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization},
  author={Defazio, Aaron and Jelassi, Samy},
  journal={arXiv preprint arXiv:2101.11075},
  year={2021}
}

AdaHessian

@article{yao2020adahessian,
  title={ADAHESSIAN: An adaptive second order optimizer for machine learning},
  author={Yao, Zhewei and Gholami, Amir and Shen, Sheng and Mustafa, Mustafa and Keutzer, Kurt and Mahoney, Michael W},
  journal={arXiv preprint arXiv:2006.00719},
  year={2020}
}

AdaBound

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

AdaBelief

@article{zhuang2020adabelief,
  title={Adabelief optimizer: Adapting stepsizes by the belief in observed gradients},
  author={Zhuang, Juntang and Tang, Tommy and Ding, Yifan and Tatikonda, Sekhar and Dvornek, Nicha and Papademetris, Xenophon and Duncan, James S},
  journal={arXiv preprint arXiv:2010.07468},
  year={2020}
}

Sharpness-Aware Minimization

@article{foret2020sharpness,
  title={Sharpness-aware minimization for efficiently improving generalization},
  author={Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam},
  journal={arXiv preprint arXiv:2010.01412},
  year={2020}
}

Adaptive Sharpness-Aware Minimization

@article{kwon2021asam,
  title={ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks},
  author={Kwon, Jungmin and Kim, Jeongseop and Park, Hyunseo and Choi, In Kwon},
  journal={arXiv preprint arXiv:2102.11600},
  year={2021}
}

diffGrad

@article{dubey2019diffgrad,
  title={diffgrad: An optimization method for convolutional neural networks},
  author={Dubey, Shiv Ram and Chakraborty, Soumendu and Roy, Swalpa Kumar and Mukherjee, Snehasis and Singh, Satish Kumar and Chaudhuri, Bidyut Baran},
  journal={IEEE transactions on neural networks and learning systems},
  volume={31},
  number={11},
  pages={4500--4511},
  year={2019},
  publisher={IEEE}
}

On the Convergence of Adam and Beyond

@article{reddi2019convergence,
  title={On the convergence of adam and beyond},
  author={Reddi, Sashank J and Kale, Satyen and Kumar, Sanjiv},
  journal={arXiv preprint arXiv:1904.09237},
  year={2019}
}

Author

Hyeongchan Kim / @kozistr

Comments
  • Sharpness Aware Minimization (SAM) requires closure

    Sharpness Aware Minimization (SAM) requires closure

    Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

    RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure

    question 
    opened by manza-ari 21
  •  Trying to use SAM optimizer for Random Sampling Image Classification

    Trying to use SAM optimizer for Random Sampling Image Classification

    I am trying to use SAM optimizer when I use the backward function twice in train_epoch() # second forward-backward pass, it gives me an error otherwise it works fine.

    Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

    `def train_epoch(models, criterion, optimizers, dataloaders):

    models.train()
    global iters
    for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
        with torch.cuda.device(CUDA_VISIBLE_DEVICES):
            inputs = data[0].cuda()
            labels = data[1].cuda()
        iters += 1
        optimizers.zero_grad()  
        #pdb.set_trace()      
        scores, _, features = models(inputs) 
        
        target_loss = criterion(scores, labels)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss  = m_backbone_loss
         # -----------------SAM Optimizer -------------------
        # first forward-backward pass
        criterion(models(inputs)[0], labels)
        loss.backward(retain_graph=True)
        optimizers.first_step(zero_grad=True)
        
        # second forward-backward pass
        criterion(models(inputs)[0], labels)
        loss.backward(retain_graph=True)
        optimizers.second_step(zero_grad=True)
    #return loss`
    
    question 
    opened by manza-ari 14
  • Ranger21 does not work

    Ranger21 does not work

    Below is the trace when I try to use Ranger21, other optimizers work as they should

    c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in init(self, params, lr, beta0, betas, use_softplus, beta_softplus, num_iterations, num_warm_up_iterations, num_warm_down_iterations, warm_down_min_lr, agc_clipping_value, agc_eps, centralize_gradients, normalize_gradients, lookahead_merge_time, lookahead_blending_alpha, weight_decay, norm_loss_factor, eps) 114 # warmup iterations 115 self.num_warm_up_iterations: int = ( --> 116 self.build_warm_up_iterations(num_iterations, betas[1]) 117 if num_warm_up_iterations is None 118 else num_warm_up_iterations

    c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in build_warm_up_iterations(total_iterations, beta2, warm_up_pct) 150 def build_warm_up_iterations(total_iterations: int, beta2: float, warm_up_pct: float = 0.22) -> int: 151 warm_up_iterations: int = math.ceil(2.0 / (1.0 - beta2)) # default un-tuned linear warmup --> 152 beta_pct: float = warm_up_iterations / total_iterations 153 if beta_pct > 0.45: 154 return int(warm_up_pct * total_iterations)

    TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

    bug 
    opened by BaconGabe 3
  • [Feature] support torch.hub.load

    [Feature] support torch.hub.load

    Problem (Why?)

    loading optimizers via torch.hub.load

    Solution (What/How?)

    it is inconvenient to define functions one by one, I used a trick using globals().

    Other changes (bug fixes, small refactors)

    Change Callable to Type[Optimizer]. Perhaps this is what you intended. see: typing.Type

    Notes

    example colab

    import torch
    
    Adan = torch.hub.load("Bing-su/pytorch_optimizer:hubconf", "Adan")
    
    feature size/S 
    opened by Bing-su 2
  • [Test] Increase the test coverage

    [Test] Increase the test coverage

    Problem (Why?)

    heading to coverage 98%

    Solution (What/How?)

    • [x] update test_no_gradient

    Other changes (bug fixes, small refactors)

    • [x] fix API documentation

    Notes

    nope

    documentation enhancement size/XL 
    opened by kozistr 1
  • [Refactor/Docs] Organize Class docstring & Add custom exceptions

    [Refactor/Docs] Organize Class docstring & Add custom exceptions

    Problem (Why?)

    there's no proper exception class (e.g. no sparse gradient, zero parameter size)

    Solution (What/How?)

    • [x] register custom exceptions
    • [x] refactor the docstrings
    • [x] support gradient centralization for Adai optimizer
    • [x] support AdamD debias for AdaPNM optimizer
    • [x] fix SAM optimizer
    • [x] add API documentation

    Other changes (bug fixes, small refactors)

    • [x] wrapper to the module (not __init__) in hubconf.py
    • [x] add a citation to README.rst

    Notes

    to v2.1.1

    bug documentation enhancement refactoring size/XXL 
    opened by kozistr 1
  • [Feature] Implement `Adai` optimizer

    [Feature] Implement `Adai` optimizer

    Problem (Why?)

    Implement Adai optimizer

    Solution (What/How?)

    • [x] implement Adai & Adai v2 optimizers

    Other changes (bug fixes, small refactors)

    nope

    Notes

    version to v2.1.0

    feature size/L 
    opened by kozistr 1
  • [CI] Reduce `num_iterations` to speed up the testing

    [CI] Reduce `num_iterations` to speed up the testing

    Problem (Why?)

    num_iterations to train the model is kinda enough while the testing takes about 2 mins.

    Solution (What/How?)

    • [x] reduce num_iterations to 100 ~ 200, which is enough to train the model (it takes 2 mins -> 1 min)

    Other changes (bug fixes, small refactors)

    • [x] explicit torch package to CPU version

    Notes

    nope

    enhancement size/L 
    opened by kozistr 1
  • [CI] Add `pytest-testmon` to reduce testing time

    [CI] Add `pytest-testmon` to reduce testing time

    Problem (Why?)

    only to run needed tests, not the whole cases.

    Solution (What/How?)

    • [x] add pytest-testmon

    Other changes (bug fixes, small refactors)

    nope

    Notes

    nope

    dependencies size/M 
    opened by kozistr 1
  • [Build] Upgrade Python version to 3.11 for CI/CD pipeline

    [Build] Upgrade Python version to 3.11 for CI/CD pipeline

    Problem (Why?)

    just upgrading

    Solution (What/How?)

    • [x] Python version to 3.11 for CI/CD pipeline
    • [x] github action
      • [x] codecov/codecov-action to v3
      • [x] actions/setup-python to v4
    • [x] remove CUDA-related packages from the dependencies manually
    • [x] upgrade dev dependencies
    • [x] replace lint.py with pylint built-in option (fail-under)
    • [x] update .pylintrc

    Other changes (bug fixes, small refactors)

    nope

    Notes

    nope

    dependencies size/L 
    opened by kozistr 1
  • [Build] Bump setuptools from 65.5.0 to 65.5.1

    [Build] Bump setuptools from 65.5.0 to 65.5.1

    Bumps setuptools from 65.5.0 to 65.5.1.

    Changelog

    Sourced from setuptools's changelog.

    v65.5.1

    Misc ^^^^

    • #3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok
    • #3659: Fixed REDoS vector in package_index.
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies size/XS 
    opened by dependabot[bot] 1
  • Versions of codes that work with half precision models

    Versions of codes that work with half precision models

    Hi I just discovered your repo and I would like to try it to fine-tune my ParlAI blenderbot2 (see https://github.com/facebookresearch/ParlAI) model. However, I am running the model in FP16 precision to make better use of my GPU. ParlAI has versions of a few optimizers that can use FP16 models, and I have tried installing a couple of other optimizers that can also work with FP16 models by casting the state parameters and gradients to FP32 within the optimizer, determining the new state parameters with FP32 accuracy, and recasting the state parameters back to FP16 for updating the model. If you had a version of your library that automatically did this, it would greatly simplify its use with FP16 precision models. Thanks!

    P.S. It looks like adabelief, radam, and diffrgrad do something like this, but not in a consistent way.

    feature request 
    opened by sjscotti 1
Releases(v2.1.1)
Owner
Hyeongchan Kim
Machine Learning Researcher
Hyeongchan Kim
PyTorch to TensorFlow Lite converter

PyTorch to TensorFlow Lite converter

Omer Ferhat Sarioglu 140 Dec 13, 2022
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

Remi 8.7k Dec 31, 2022
TorchSSL: A PyTorch-based Toolbox for Semi-Supervised Learning

TorchSSL: A PyTorch-based Toolbox for Semi-Supervised Learning

1k Dec 28, 2022
A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

🤗 Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

Hugging Face 3.5k Jan 08, 2023
A PyTorch implementation of L-BFGS.

PyTorch-LBFGS: A PyTorch Implementation of L-BFGS Authors: Hao-Jun Michael Shi (Northwestern University) and Dheevatsa Mudigere (Facebook) What is it?

Hao-Jun Michael Shi 478 Dec 27, 2022
Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

Lambda Networks - Pytorch Implementation of λ Networks, a new approach to image recognition that reaches SOTA on ImageNet. The new method utilizes λ l

Phil Wang 1.5k Jan 07, 2023
Riemannian Adaptive Optimization Methods with pytorch optim

geoopt Manifold aware pytorch.optim. Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more. Installation Make sur

642 Jan 03, 2023
Official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis.

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis This repo contains the official implementations of EigenDamage: Structured Prunin

Chaoqi Wang 107 Apr 20, 2022
Tutorial for surrogate gradient learning in spiking neural networks

SpyTorch A tutorial on surrogate gradient learning in spiking neural networks Version: 0.4 This repository contains tutorial files to get you started

Friedemann Zenke 203 Nov 28, 2022
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural

Andrej 3.5k Jan 08, 2023
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

LoLo 2.9k Dec 27, 2022
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

Facebook Research 1.5k Jan 03, 2023
High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

382 Dec 06, 2022
High-fidelity performance metrics for generative models in PyTorch

High-fidelity performance metrics for generative models in PyTorch

Vikram Voleti 5 Oct 24, 2021
Code snippets created for the PyTorch discussion board

PyTorch misc Collection of code snippets I've written for the PyTorch discussion board. All scripts were testes using the PyTorch 1.0 preview and torc

461 Dec 26, 2022
An implementation of Performer, a linear attention-based transformer, in Pytorch

Performer - Pytorch An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random

Phil Wang 900 Dec 22, 2022
lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch

lookahead optimizer for pytorch PyTorch implement of Lookahead Optimizer: k steps forward, 1 step back Usage: base_opt = torch.optim.Adam(model.parame

Liam 318 Dec 09, 2022
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Implementation of Differentiable ODE Solvers This library provides ordinary differential equation (ODE) solvers implemented in PyTorch. Backpr

Ricky Chen 4.4k Jan 04, 2023
Fast Discounted Cumulative Sums in PyTorch

TODO: update this README! Fast Discounted Cumulative Sums in PyTorch This repository implements an efficient parallel algorithm for the computation of

Daniel Povey 7 Feb 17, 2022