ComposerScheduler#

class composer.optim.ComposerScheduler[source]#

Specification for a stateless scheduler function.

While this specification is provided as a Python class, an ordinary function can implement this interface as long as it matches the signature of this interface’s __call__() method.

For example, a scheduler that halves the learning rate after 10 epochs could be written as:

def ten_epoch_decay_scheduler(state: State) -> float:
    if state.timestamp.epoch < 10:
        return 1.0
    return 0.5

# ten_epoch_decay_scheduler is a valid ComposerScheduler
trainer = Trainer(
    schedulers=[ten_epoch_decay_scheduler],
    ...
)

In order to allow schedulers to be configured, schedulers may also written as callable classes:

class VariableEpochDecayScheduler(ComposerScheduler):

    def __init__(num_epochs: int):
        self.num_epochs = num_epochs

    def __call__(state: State) -> float:
        if state.time.epoch < self.num_epochs:
            return 1.0
        return 0.5

ten_epoch_decay_scheduler = VariableEpochDecayScheduler(num_epochs=10)
# ten_epoch_decay_scheduler is also a valid ComposerScheduler
trainer = Trainer(
    schedulers=[ten_epoch_decay_scheduler],
    ...
)

The constructions of ten_epoch_decay_scheduler in each of the examples above are equivalent. Note that neither scheduler uses the scale_schedule_ratio parameter. As long as this parameter is not used when initializing Trainer, it is not required that any schedulers implement that parameter.

__call__(state, ssr=1.0)[source]#

Calculate the current learning rate multiplier α\alpha.

A scheduler function should be a pure function that returns a multiplier to apply to the optimizer’s provided learning rate, given the current trainer state, and optionally a “scale schedule ratio” (SSR). A typical implementation will read state.timestamp, and possibly other fields like state.max_duration, to determine the trainer’s latest temporal progress.

Note

All instances of ComposerScheduler output a multiplier for the learning rate, rather than the learning rate directly. By convention, we use the symbol α\alpha to refer to this multiplier. This means that the learning rate η\eta at time tt can be represented as η(t)=ηi×α(t)\eta(t) = \eta_i \times \alpha(t), where ηi\eta_i represents the learning rate used to initialize the optimizer.

Note

It is possible to use multiple schedulers, in which case their effects will stack multiplicatively.

The ssr param indicates that the schedule should be “stretched” accordingly. In symbolic terms, where ασ(t)\alpha_\sigma(t) represents the scheduler output at time tt using scale schedule ratio σ\sigma:

ασ(t)=α(t/σ)\alpha_{\sigma}(t) = \alpha(t / \sigma)
Parameters
  • state (State) – The current Composer Trainer state.

  • ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time tt with an SSR of 1.0 should be the same as that computed by this scheduler at time t×st \times s with an SSR of ss. Default = 1.0.

Returns

alpha (float) – A multiplier to apply to the optimizer’s provided learning rate.