PolynomialWithWarmupScheduler#

class composer.optim.PolynomialWithWarmupScheduler(t_warmup, power=2.0, t_max='1dur', alpha_f=0.0, scale_warmup=False)[source]#

Decays the learning rate according to a power of the fraction of training time left, with an initial warmup.

See also

This scheduler is based on PolynomialScheduler, with an added warmup.

Specifically, the learning rate multiplier \(\alpha\) can be expressed as:

\[\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \alpha_f + (1 - \alpha_f) \times (1 - \tau_w) ^ {\kappa} & \text{otherwise} \end{cases} \]

Given \(\tau_w\), the fraction of post-warmup time elapsed (clipped to the interval \([0, 1]\)), as:

\[\tau_w = (t - t_{warmup}) / t_{max} \]

Where \(\kappa\) represents the exponent to be used for the proportionality relationship, \(t_{warmup}\) represents the warmup time, \(t_{max}\) represents the duration of this scheduler, and \(\alpha_f\) represents the learning rate multiplier to decay to.

Warning

By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set scale_warmup=True.

Parameters
  • t_warmup (str | Time) โ€“ Warmup time.

  • power (float) โ€“ The exponent to be used for the proportionality relationship. Default = 2.0.

  • t_max (str | Time) โ€“ The duration of this scheduler. Default = "1dur".

  • alpha_f (float) โ€“ Learning rate multiplier to decay to. Default = 0.0.

  • scale_warmup (float) โ€“ SSR also scales the warmup period. Default = False.