CosineAnnealingWithWarmup

Formulation

The learning rate is annealed using a cosine schedule over the course of learning of n_total total steps with an initial warmup period of n_warmup steps. Hence, the learning rate at step i is computed as:

Learning rate will be changed as:

Usage

# optimizer, warmup_epochs, warmup_lr, num_epochs, base_lr, final_lr, iter_per_epoch
lr_scheduler = LR_Scheduler(
        optimizer,
        args.warmup_epochs, args.warmup_lr*args.batch_size/256, 
        args.epochs, args.lr*args.batch_size/256, args.final_lr*args.batch_size/256, 
        len(train_loader),
    )

for data in range(train_loader):
  optimizer.zero_grad()
  output = model(data) 
  loss = lossfunc(output,gt)
  loss.backward()
  optimizer.step()
  lr_scheduler.step()

In CV domain [1,2], in order to automatically adapt different batch size you can use a learning rate of lr×BatchSize/256 (linear scaling [4])(we can use larger learning rate while adopting larger batch size, especially, when you use LARS optimizer[3]). Of course, you can modify it according to your specific requirements.

Reference

[1] Jean-Bastien Grill, Florian Strub, Florent Altch´e, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, R´emi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. arXiv:2006.07733v1, 2020.
[2] Chen X, He K. Exploring simple siamese representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 15750-15758.
[3] Yang You, Igor Gitman, and Boris Ginsburg. Large batch training of convolutional networks. arXiv:1708.03888, 2017.
[4] Priya Goyal, Piotr Doll´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677, 2017.
[5] https://github.com/PatrickHua/SimSiam?utm_source=catalyzex.com

Cosine Annealing With Warmup

Related tags

Overview

CosineAnnealingWithWarmup

Formulation

Usage

Reference

Owner

zhuyun

Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

My implementation of transformers related papers for computer vision in pytorch

Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Pytorch implementation of MaskFlownet

Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it.

Compute FID scores with PyTorch.

Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

A library that can print Python objects in human readable format

Deploy recommendation engines with Edge Computing

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

QT Py Media Knob using rotary encoder & neopixel ring

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

Codes for AAAI22 paper "Learning to Solve Travelling Salesman Problem with Hardness-Adaptive Curriculum"

MXNet implementation for: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

TensorFlow implementation of "Attention is all you need (Transformer)"

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Machine learning algorithms for many-body quantum systems