- class composer.callbacks.OptimizerMonitor(log_optimizer_metrics=True)#
Computes and logs the L2 norm of gradients as well as any optimizer-specific metrics implemented in the optimizer’s report_per_parameter_metrics method.
L2 norms are calculated after the reduction of gradients across GPUs. This function iterates over the parameters of the model and may cause a reduction in throughput while training large models. In order to ensure the correctness of the norm, this function should be called after gradient unscaling in cases where gradients are scaled.
>>> from composer import Trainer >>> from composer.callbacks import OptimizerMonitor >>> # constructing trainer object with this callback >>> trainer = Trainer( ... model=model, ... train_dataloader=train_dataloader, ... eval_dataloader=eval_dataloader, ... optimizers=optimizer, ... max_duration="1ep", ... callbacks=[OptimizerMonitor()], ... )
The metrics are logged by the
Loggerto the following keys as described below. grad_l2_norm and layer_grad_l2_norm are logged in addition to metrics logged by the optimizer’s report_per_parameter_metrics method. For convenience we have listed the metrics logged by DecoupledAdamW below.
L2 norm of the gradients of all parameters in the model on the
Layer-wise L2 norms
- Layer-wise L2 norms of Adam first moment after
calling optimizer step.
Layer-wise ratio of the gradient norm to the moment norm after calling optimizer step.
Layer-wise cosine angle between gradient and moment after calling optimizer step.
Layer-wise L2 norms of parameter weights
- Layer-wise L2 norms of the square root
of the Adam second moment is.
Layer-wise L2 norms of the step
Layer-wise cosine between the gradient and the step
Layer-wise ratio between step size and parameter norm