MultiStepWithWarmupScheduler#
- class composer.optim.MultiStepWithWarmupScheduler(t_warmup, milestones, gamma=0.1, scale_warmup=False)[source]#
Decays the learning rate discretely at fixed milestones, with an initial warmup.
See also
This scheduler is based on
MultiStepScheduler, with an added warmup.Starts with a linear warmup over
t_warmuptime, then decays the learning rate by a factor ofgammawhenever a time milestone inmilestonesis reached.Specifically, the learning rate multiplier \(\alpha\) can be expressed as:
\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \gamma ^ x & \text{otherwise} \end{cases} \]Where \(t_{warmup}\) represents the warmup time, \(x\) represents the amount of milestones that have been reached, and \(\gamma\) represents the multiplicative decay factor.
Warning
All milestones should be greater than
t_warmup; otherwise, they will have no effect on the computed learning rate multiplier until the warmup has completed.Warning
By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set
scale_warmup=True.