CosineAnnealingWithWarmupScheduler#
- class composer.optim.CosineAnnealingWithWarmupScheduler(t_warmup, t_max='1dur', alpha_f=0.0, scale_warmup=False)[source]#
Decays the learning rate according to the decreasing part of a cosine curve, with an initial warmup.
See also
This scheduler is based on
CosineAnnealingScheduler
, with an added warmup.Specifically, the learning rate multiplier can be expressed as:
Given , the fraction of post-warmup time elpased (clipped to the interval ), as:
Where represents the warmup time, represents the duration of this scheduler, and represents the learning rate multiplier to decay to.
Warning
By default, initial warmup time is not scaled according to any provided scale schedule ratio. To change this behavior, set
scale_warmup=True
.