composer.optim#

Modules

 composer.optim.decoupled_weight_decay Optimizers with weight decay decoupled from the learning rate. composer.optim.optimizer_hparams Hyperparameters for optimizers. composer.optim.scheduler Stateless learning rate schedulers. composer.optim.scheduler_hparams Hyperparameters for schedulers.

Optimizers and learning rate schedulers.

Composer is compatible with optimizers based off of PyTorch’s native Optimizer API, and common optimizers such as SGD and Adam have been thoroughly tested with Composer. However, where applicable, it is recommended to use the optimizers provided in decoupled_weight_decay since they improve off of their PyTorch equivalents.

PyTorch schedulers can be used with Composer, but this is explicitly discouraged. Instead, it is recommended to use schedulers based off of Composer’s ComposerScheduler API, which allows more flexibility and configuration in writing schedulers.

Classes

 ComposerScheduler Specification for a stateless scheduler function. ConstantScheduler Maintains a fixed learning rate. CosineAnnealingScheduler Decays the learning rate according to the decreasing part of a cosine curve. CosineAnnealingWarmRestartsScheduler Cyclically decays the learning rate according to the decreasing part of a cosine curve. CosineAnnealingWithWarmupScheduler Decays the learning rate according to the decreasing part of a cosine curve, with an initial warmup. DecoupledAdamW Adam optimizer with the weight decay term decoupled from the learning rate. DecoupledSGDW SGD optimizer with the weight decay term decoupled from the learning rate. ExponentialScheduler Decays the learning rate exponentially. LinearScheduler Adjusts the learning rate linearly. LinearWithWarmupScheduler Adjusts the learning rate linearly, with an initial warmup. MultiStepScheduler Decays the learning rate discretely at fixed milestones. MultiStepWithWarmupScheduler Decays the learning rate discretely at fixed milestones, with an initial warmup. PolynomialScheduler Sets the learning rate to be proportional to a power of the fraction of training time left. PolynomialWithWarmupScheduler Decays the learning rate according to a power of the fraction of training time left, with an initial warmup. StepScheduler Decays the learning rate discretely at fixed intervals.

Hparams

These classes are used with yahp for YAML-based configuration.

 AdamHparams Hyperparameters for the Adam optimizer. AdamWHparams Hyperparameters for the AdamW optimizer. ConstantSchedulerHparams Hyperparameters for the ConstantScheduler scheduler. CosineAnnealingSchedulerHparams Hyperparameters for the CosineAnnealingScheduler scheduler. CosineAnnealingWarmRestartsSchedulerHparams Hyperparameters for the CosineAnnealingWarmRestartsScheduler scheduler. CosineAnnealingWithWarmupSchedulerHparams Hyperparameters for the CosineAnnealingWithWarmupScheduler scheduler. DecoupledAdamWHparams Hyperparameters for the DecoupledAdamW optimizer. DecoupledSGDWHparams Hyperparameters for the DecoupledSGDW optimizer. ExponentialSchedulerHparams Hyperparameters for the ExponentialScheduler scheduler. LinearSchedulerHparams Hyperparameters for the LinearScheduler scheduler. LinearWithWarmupSchedulerHparams Hyperparameters for the LinearWithWarmupScheduler scheduler. MultiStepSchedulerHparams Hyperparameters for the MultiStepScheduler scheduler. MultiStepWithWarmupSchedulerHparams Hyperparameters for the MultiStepWithWarmupScheduler scheduler. OptimizerHparams Base class for optimizer hyperparameter classes. PolynomialSchedulerHparams Hyperparameters for the PolynomialScheduler scheduler. PolynomialWithWarmupSchedulerHparams Hyperparameters for the PolynomialWithWarmupScheduler scheduler. RAdamHparams Hyperparameters for the RAdam optimizer. RMSpropHparams Hyperparameters for the RMSprop optimizer. SGDHparams Hyperparameters for the SGD optimizer. SchedulerHparams Base class for scheduler hyperparameter classes. StepSchedulerHparams Hyperparameters for the StepScheduler scheduler.