# PolynomialWithWarmupScheduler#

class composer.optim.PolynomialWithWarmupScheduler(t_warmup, power=2.0, t_max='1dur', alpha_f=0.0, scale_warmup=False)[source]#

Decays the learning rate according to a power of the fraction of training time left, with an initial warmup.

This scheduler is based on PolynomialScheduler, with an added warmup.

Specifically, the learning rate multiplier $$\alpha$$ can be expressed as:

$\alpha(t) = \begin{cases} t / t_{warmup}, & \text{if } t < t_{warmup} \\ \alpha_f + (1 - \alpha_f) \times (1 - \tau_w) ^ {\kappa} & \text{otherwise} \end{cases}$

Given $$\tau_w$$, the fraction of post-warmup time elapsed (clipped to the interval $$[0, 1]$$), as:

$\tau_w = (t - t_{warmup}) / t_{max}$

Where $$\kappa$$ represents the exponent to be used for the proportionality relationship, $$t_{warmup}$$ represents the warmup time, $$t_{max}$$ represents the duration of this scheduler, and $$\alpha_f$$ represents the learning rate multiplier to decay to.

Warning

By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set scale_warmup=True.

Parameters
• t_warmup (str | Time) – Warmup time.

• power (float) – The exponent to be used for the proportionality relationship. Default = 2.0.

• t_max (str | Time) – The duration of this scheduler. Default = "1dur".

• alpha_f (float) – Learning rate multiplier to decay to. Default = 0.0.

• scale_warmup (float) – SSR also scales the warmup period. Default = False.