class composer.optim.MultiStepWithWarmupScheduler(t_warmup, milestones, gamma=0.1, scale_warmup=False)[source]#

Decays the learning rate discretely at fixed milestones, with an initial warmup.

See also

This scheduler is based on MultiStepScheduler, with an added warmup.

Starts with a linear warmup over t_warmup time, then decays the learning rate by a factor of gamma whenever a time milestone in milestones is reached.

Specifically, the learning rate multiplier \(\alpha\) can be expressed as:

\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \gamma ^ x & \text{otherwise} \end{cases} \]

Where \(t_{warmup}\) represents the warmup time, \(x\) represents the amount of milestones that have been reached, and \(\gamma\) represents the multiplicative decay factor.


All milestones should be greater than t_warmup; otherwise, they will have no effect on the computed learning rate multiplier until the warmup has completed.


By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set scale_warmup=True.

  • t_warmup (str | Time) โ€“ Warmup time.

  • milestones (List[str | Time]) โ€“ Times at which the learning rate should change.

  • gamma (float) โ€“ Multiplicative decay factor. Default = 0.1.

  • scale_warmup (float) โ€“ SSR also scales the warmup period. Default = False.