class composer.optim.LinearWithWarmupScheduler(t_warmup, alpha_i=1.0, alpha_f=0.0, t_max='1dur', scale_warmup=False)[source]#

Adjusts the learning rate linearly, with an initial warmup.

See also

This scheduler is based on LinearScheduler, with an added warmup.

Linearly adjusts the learning rate multiplier from alpha_i to alpha_f over t_{max} time.

Specifically, the learning rate multiplier \(\alpha\) can be expressed as:

\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \alpha_i + (alpha_f - \alpha_i) \times \tau_w & \text{otherwise} \end{cases} \]

Given \(\tau_w\), the fraction of post-warmup time elapsed (clipped to the interval \([0, 1]\)), as:

\[\tau_w = (t - t_{warmup}) / t_{max} \]

Where \(t_{warmup}\) represents the warmup time, \(\alpha_i\) represents the initial learning rate multiplier, and \(\alpha_f\) represents the learning rate multiplier to decay to, and \(t_{max}\) represents the duration of this scheduler.


By default, the initial warmup time is not scaled according to any provided scale schedule ratio! However, the duration of the scheduler is still scaled accordingly. To achieve this, after warmup, the schedulerโ€™s โ€œslopeโ€ will be slightly distorted from what would otherwise be expected. To scale the entire schedule, set scale_warmup=True.

  • t_warmup (str | Time) โ€“ Warmup time.

  • alpha_i (float) โ€“ Initial learning rate multiplier. Default = 1.0.

  • alpha_f (float) โ€“ Final learning rate multiplier. Default = 0.0.

  • t_max (str | Time) โ€“ The duration of this scheduler. Default = "1dur".

  • scale_warmup (float) โ€“ SSR also scales the warmup period. Default = False.