composer.optim#

Optimizers and learning rate schedulers.

Composer is compatible with optimizers based off of PyTorchโ€™s native Optimizer API, and common optimizers such However, where applicable, it is recommended to use the optimizers provided in decoupled_weight_decay since they improve off of their PyTorch equivalents.

PyTorch schedulers can be used with Composer, but this is explicitly discouraged. Instead, it is recommended to use schedulers based off of Composerโ€™s ComposerScheduler API, which allows more flexibility and configuration in writing schedulers.

Functions

compile_composer_scheduler

Converts a stateless scheduler into a PyTorch scheduler object.

Classes

ComposerScheduler

Specification for a stateless scheduler function.

ConstantScheduler

Maintains a fixed learning rate.

ConstantWithWarmupScheduler

Maintains a fixed learning rate, with an initial warmup.

CosineAnnealingScheduler

Decays the learning rate according to the decreasing part of a cosine curve.

CosineAnnealingWarmRestartsScheduler

Cyclically decays the learning rate according to the decreasing part of a cosine curve.

CosineAnnealingWithWarmupScheduler

Decays the learning rate according to the decreasing part of a cosine curve, with an initial warmup.

DecoupledAdamW

Adam optimizer with the weight decay term decoupled from the learning rate.

DecoupledSGDW

SGD optimizer with the weight decay term decoupled from the learning rate.

ExponentialScheduler

Decays the learning rate exponentially.

LinearScheduler

Adjusts the learning rate linearly.

LinearWithWarmupScheduler

Adjusts the learning rate linearly, with an initial warmup.

MultiStepScheduler

Decays the learning rate discretely at fixed milestones.

MultiStepWithWarmupScheduler

Decays the learning rate discretely at fixed milestones, with an initial warmup.

PolynomialScheduler

Sets the learning rate to be proportional to a power of the fraction of training time left.

PolynomialWithWarmupScheduler

Decays the learning rate according to a power of the fraction of training time left, with an initial warmup.

StepScheduler

Decays the learning rate discretely at fixed intervals.