PolynomialWithWarmupScheduler#
- class composer.optim.PolynomialWithWarmupScheduler(t_warmup, power=2.0, t_max='1dur', alpha_f=0.0, scale_warmup=False)[source]#
Decays the learning rate according to a power of the fraction of training time left, with an initial warmup.
See also
This scheduler is based on
PolynomialScheduler
, with an added warmup.Specifically, the learning rate multiplier can be expressed as:
Given , the fraction of post-warmup time elapsed (clipped to the interval ), as:
Where represents the exponent to be used for the proportionality relationship, represents the warmup time, represents the duration of this scheduler, and represents the learning rate multiplier to decay to.
Warning
By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set
scale_warmup=True
.- Parameters
power (float) โ The exponent to be used for the proportionality relationship. Default =
2.0
.t_max (str | Time) โ The duration of this scheduler. Default =
"1dur"
.alpha_f (float) โ Learning rate multiplier to decay to. Default =
0.0
.scale_warmup (float) โ SSR also scales the warmup period. Default =
False
.