CosineAnnealingWithWarmupScheduler#

class composer.optim.CosineAnnealingWithWarmupScheduler(t_warmup, t_max='1dur', alpha_f=0.0, scale_warmup=False)[source]#

Decays the learning rate according to the decreasing part of a cosine curve, with an initial warmup.

See also

This scheduler is based on CosineAnnealingScheduler, with an added warmup.

Specifically, the learning rate multiplier \(\alpha\) can be expressed as:

\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \alpha_f + (1 - \alpha_f) \times \frac{1}{2} (1 + \cos(\pi \times \tau_w)) & \text{otherwise} \end{cases} \]

Given \(\tau_w\), the fraction of post-warmup time elapsed (clipped to the interval \([0, 1]\)), as:

\[\tau_w = (t - t_{warmup}) / t_{max} \]

Where \(t_{warmup}\) represents the warmup time, \(t_{max}\) represents the duration of this scheduler, and \(\alpha_f\) represents the learning rate multiplier to decay to.

Warning

By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set scale_warmup=True.

Parameters
  • t_warmup (str | Time) โ€“ Warmup time.

  • t_max (str | Time) โ€“ The duration of this scheduler. Default = "1dur".

  • alpha_f (float) โ€“ Learning rate multiplier to decay to. Default = 0.0.

  • scale_warmup (float) โ€“ SSR also scales the warmup period. Default = False.