CosineAnnealingWithWarmupScheduler#

class composer.optim.CosineAnnealingWithWarmupScheduler(t_warmup, t_max='1dur', alpha_f=0.0, scale_warmup=False)[source]#

Decays the learning rate according to the decreasing part of a cosine curve, with an initial warmup.

See also

This scheduler is based on CosineAnnealingScheduler, with an added warmup.

Specifically, the learning rate multiplier ฮฑ\alpha can be expressed as:

ฮฑ(t)={t/twarmup,if t<twarmupฮฑf+(1โˆ’ฮฑf)ร—12(1+cosโก(ฯ€ร—ฯ„w))otherwise\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \alpha_f + (1 - \alpha_f) \times \frac{1}{2} (1 + \cos(\pi \times \tau_w)) & \text{otherwise} \end{cases}

Given ฯ„w\tau_w, the fraction of post-warmup time elpased (clipped to the interval [0,1][0, 1]), as:

ฯ„w=(tโˆ’twarmup)/tmax\tau_w = (t - t_{warmup}) / t_{max}

Where twarmupt_{warmup} represents the warmup time, tmaxt_{max} represents the duration of this scheduler, and ฮฑf\alpha_f represents the learning rate multiplier to decay to.

Warning

By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set scale_warmup=True.

Parameters
  • t_warmup (str | Time) โ€“ Warmup time.

  • t_max (str | Time) โ€“ The duration of this scheduler. Default = "1dur".

  • alpha_f (float) โ€“ Learning rate multiplier to decay to. Default = 0.0.

  • scale_warmup (float) โ€“ SSR also scales the warmup period. Default = False.