ComposerScheduler#

class composer.optim.ComposerScheduler[source]#

Specification for a stateless scheduler function.

While this specification is provided as a Python class, an ordinary function can implement this interface as long as it matches the signature of this interfaceโ€™s __call__() method.

For example, a scheduler that halves the learning rate after 10 epochs could be written as:

def ten_epoch_decay_scheduler(state: State) -> float:
    if state.timestamp.epoch < 10:
        return 1.0
    return 0.5

# ten_epoch_decay_scheduler is a valid ComposerScheduler
trainer = Trainer(
    schedulers=[ten_epoch_decay_scheduler],
    ...
)

In order to allow schedulers to be configured, schedulers may also written as callable classes:

class VariableEpochDecayScheduler(ComposerScheduler):

    def __init__(num_epochs: int):
        self.num_epochs = num_epochs

    def __call__(state: State) -> float:
        if state.time.epoch < self.num_epochs:
            return 1.0
        return 0.5

ten_epoch_decay_scheduler = VariableEpochDecayScheduler(num_epochs=10)
# ten_epoch_decay_scheduler is also a valid ComposerScheduler
trainer = Trainer(
    schedulers=[ten_epoch_decay_scheduler],
    ...
)

The constructions of ten_epoch_decay_scheduler in each of the examples above are equivalent. Note that neither scheduler uses the scale_schedule_ratio parameter. As long as this parameter is not used when initializing Trainer, it is not required that any schedulers implement that parameter.

__call__(state, ssr=1.0)[source]#

Calculate the current learning rate multiplier \(\alpha\).

A scheduler function should be a pure function that returns a multiplier to apply to the optimizerโ€™s provided learning rate, given the current trainer state, and optionally a โ€œscale schedule ratioโ€ (SSR). A typical implementation will read state.timestamp, and possibly other fields like state.max_duration, to determine the trainerโ€™s latest temporal progress.

Note

All instances of ComposerScheduler output a multiplier for the learning rate, rather than the learning rate directly. By convention, we use the symbol \(\alpha\) to refer to this multiplier. This means that the learning rate \(\eta\) at time \(t\) can be represented as \(\eta(t) = \eta_i \times \alpha(t)\), where \(\eta_i\) represents the learning rate used to initialize the optimizer.

Note

It is possible to use multiple schedulers, in which case their effects will stack multiplicatively.

The ssr param indicates that the schedule should be โ€œstretchedโ€ accordingly. In symbolic terms, where \(\alpha_\sigma(t)\) represents the scheduler output at time \(t\) using scale schedule ratio \(\sigma\):

\[\alpha_{\sigma}(t) = \alpha(t / \sigma) \]
Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

Returns

alpha (float) โ€“ A multiplier to apply to the optimizerโ€™s provided learning rate.