MultiStepWithWarmupScheduler#
- class composer.optim.MultiStepWithWarmupScheduler(t_warmup, milestones, gamma=0.1, scale_warmup=False)[source]#
Decays the learning rate discretely at fixed milestones, with an initial warmup.
See also
This scheduler is based on
MultiStepScheduler
, with an added warmup.Starts with a linear warmup over
t_warmup
time, then decays the learning rate by a factor ofgamma
whenever a time milestone inmilestones
is reached.Specifically, the learning rate multiplier \(\alpha\) can be expressed as:
\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \gamma ^ x & \text{otherwise} \end{cases} \]Where \(t_{warmup}\) represents the warmup time, \(x\) represents the amount of milestones that have been reached, and \(\gamma\) represents the multiplicative decay factor.
Warning
All milestones should be greater than
t_warmup
; otherwise, they will have no effect on the computed learning rate multiplier until the warmup has completed.Warning
By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set
scale_warmup=True
.