Skip to content

evanatyourservice/Apollo-tf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apollo Optimizer in Tensorflow 2.x

Unofficial implementation of https://arxiv.org/abs/2009.13586

Official implementation: https://github.com/XuezheMax/apollo

Notes:

  • Warmup is important with Apollo optimizer, so be sure to pass in a learning rate schedule vs. a constant learning rate for learning_rate. One cycle scheduler is given as an example in one_cycle_lr_schedule.py
  • To clip gradient norms as in paper, add either clipnorm (parameter-wise clipping by norm) or global_clipnorm to the arguments (for example clipnorm=0.1).
  • Decoupled weight decay is used by default.

Releases

No releases published

Packages

No packages published

Languages