PolynomialWithWarmupScheduler#
- class composer.optim.PolynomialWithWarmupScheduler(t_warmup, power=2.0, t_max='1dur', alpha_f=0.0, scale_warmup=False)[source]#
Decays the learning rate according to a power of the fraction of training time left, with an initial warmup.
See also
This scheduler is based on
PolynomialScheduler
, with an added warmup.Specifically, the learning rate multiplier \(\alpha\) can be expressed as:
\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \alpha_f + (1 - \alpha_f) \times (1 - \tau_w) ^ {\kappa} & \text{otherwise} \end{cases} \]Given \(\tau_w\), the fraction of post-warmup time elpased (clipped to the interval \([0, 1]\)), as:
\[\tau_w = (t - t_{warmup}) / t_{max} \]Where \(\kappa\) represents the exponent to be used for the proportionality relationship, \(t_{warmup}\) represents the warmup time, \(t_{max}\) represents the duration of this scheduler, and \(\alpha_f\) represents the learning rate multiplier to decay to.
Warning
By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set
scale_warmup=True
.- Parameters
power (float) โ The exponent to be used for the proportionality relationship. Default =
2.0
.t_max (str | Time) โ The duration of this scheduler. Default =
"1dur"
.alpha_f (float) โ Learning rate multiplier to decay to. Default =
0.0
.scale_warmup (float) โ SSR also scales the warmup period. Default =
False
.