ComposerScheduler#
- class composer.optim.ComposerScheduler[source]#
Specification for a stateless scheduler function.
While this specification is provided as a Python class, an ordinary function can implement this interface as long as it matches the signature of this interfaceโs
__call__()
method.For example, a scheduler that halves the learning rate after 10 epochs could be written as:
def ten_epoch_decay_scheduler(state: State) -> float: if state.timestamp.epoch < 10: return 1.0 return 0.5 # ten_epoch_decay_scheduler is a valid ComposerScheduler trainer = Trainer( schedulers=[ten_epoch_decay_scheduler], ... )
In order to allow schedulers to be configured, schedulers may also written as callable classes:
class VariableEpochDecayScheduler(ComposerScheduler): def __init__(num_epochs: int): self.num_epochs = num_epochs def __call__(state: State) -> float: if state.time.epoch < self.num_epochs: return 1.0 return 0.5 ten_epoch_decay_scheduler = VariableEpochDecayScheduler(num_epochs=10) # ten_epoch_decay_scheduler is also a valid ComposerScheduler trainer = Trainer( schedulers=[ten_epoch_decay_scheduler], ... )
The constructions of
ten_epoch_decay_scheduler
in each of the examples above are equivalent. Note that neither scheduler uses thescale_schedule_ratio
parameter. As long as this parameter is not used when initializingTrainer
, it is not required that any schedulers implement that parameter.- __call__(state, ssr=1.0)[source]#
Calculate the current learning rate multiplier \(\alpha\).
A scheduler function should be a pure function that returns a multiplier to apply to the optimizerโs provided learning rate, given the current trainer state, and optionally a โscale schedule ratioโ (SSR). A typical implementation will read
state.timestamp
, and possibly other fields likestate.max_duration
, to determine the trainerโs latest temporal progress.Note
All instances of
ComposerScheduler
output a multiplier for the learning rate, rather than the learning rate directly. By convention, we use the symbol \(\alpha\) to refer to this multiplier. This means that the learning rate \(\eta\) at time \(t\) can be represented as \(\eta(t) = \eta_i \times \alpha(t)\), where \(\eta_i\) represents the learning rate used to initialize the optimizer.Note
It is possible to use multiple schedulers, in which case their effects will stack multiplicatively.
The
ssr
param indicates that the schedule should be โstretchedโ accordingly. In symbolic terms, where \(\alpha_\sigma(t)\) represents the scheduler output at time \(t\) using scale schedule ratio \(\sigma\):\[\alpha_{\sigma}(t) = \alpha(t / \sigma) \]- Parameters
- Returns
alpha (float) โ A multiplier to apply to the optimizerโs provided learning rate.