LinearWithWarmupScheduler#
- class composer.optim.LinearWithWarmupScheduler(t_warmup, alpha_i=1.0, alpha_f=0.0, t_max='1dur', scale_warmup=False)[source]#
Adjusts the learning rate linearly, with an initial warmup.
See also
This scheduler is based on
LinearScheduler
, with an added warmup.Linearly adjusts the learning rate multiplier from
alpha_i
toalpha_f
overt_{max}
time.Specifically, the learning rate multiplier \(\alpha\) can be expressed as:
\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \alpha_i + (alpha_f - \alpha_i) \times \tau_w & \text{otherwise} \end{cases} \]Given \(\tau_w\), the fraction of post-warmup time elapsed (clipped to the interval \([0, 1]\)), as:
\[\tau_w = (t - t_{warmup}) / t_{max} \]Where \(t_{warmup}\) represents the warmup time, \(\alpha_i\) represents the initial learning rate multiplier, and \(\alpha_f\) represents the learning rate multiplier to decay to, and \(t_{max}\) represents the duration of this scheduler.
Warning
By default, the initial warmup time is not scaled according to any provided scale schedule ratio! However, the duration of the scheduler is still scaled accordingly. To achieve this, after warmup, the schedulerโs โslopeโ will be slightly distorted from what would otherwise be expected. To scale the entire schedule, set
scale_warmup=True
.