DecoupledSGDW#

class composer.optim.DecoupledSGDW(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)[source]#

SGD optimizer with the weight decay term decoupled from the learning rate.

NOTE: Since weight_decay is no longer scaled by lr, you will likely want to use much smaller values for weight_decay than you would if using torch.optim.SGD. In this optimizer, the value weight_decay translates exactly to: ‘On every optimizer update, every weight element will be multiplied by (1.0 - weight_decay_t)’. The term weight_decay_t will follow the same schedule as lr_t but crucially will not be scaled by lr.

Argument defaults are copied from torch.optim.SGD.

Why use this optimizer? The standard SGD optimizer couples the weight decay term with the gradient calculation. This ties the optimal value of weight_decay to lr and can also hurt generalization in practice. For more details on why decoupling might be desirable, see Decoupled Weight Decay Regularization.

Parameters

params (iterable) – Iterable of parameters to optimize or dicts defining parameter groups.
lr (float) – Learning rate.
momentum (int, optional) – Momentum factor. Default: 0.
dampening (int, optional) – Dampening factor applied to the momentum. Default: 0.
weight_decay (int, optional) – Decoupled weight decay factor. Default: 0.
nesterov (bool, optional) – Enables Nesterov momentum updates. Default: False.

static sgdw(params, d_p_list, momentum_buffer_list, *, weight_decay, momentum, lr, initial_lr, dampening, nesterov)[source]#

Functional API that performs SGDW algorithm computation.

Parameters

params (list) – list of parameters to update
d_p_list (list) – list of parameter gradients
momentum_buffer_list (list) – list of momentum buffers
weight_decay (float) – Decoupled weight decay factor
momentum (float) – Momentum factor
lr (float) – Learning rate
initial_lr (float) – Initial learning rate
dampening (float) – Dampening factor for momentum update
nesterov (bool) – Enables Nesterov momentum updates

step(closure=None)[source]#

Performs a single optimization step.

Parameters: closure (callable, optional) – A closure that reevaluates the model and returns the loss.