class composer.callbacks.OptimizerMonitor(log_optimizer_metrics=True, batch_log_interval=10)[source]#

Computes and logs the L2 norm of gradients as well as any optimizer-specific metrics implemented in the optimizerโ€™s report_per_parameter_metrics method.

L2 norms are calculated after the reduction of gradients across GPUs. This function iterates over the parameters of the model and may cause a reduction in throughput while training large models. In order to ensure the correctness of the norm, this function should be called after gradient unscaling in cases where gradients are scaled.


>>> from composer import Trainer
>>> from composer.callbacks import OptimizerMonitor
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     train_dataloader=train_dataloader,
...     eval_dataloader=eval_dataloader,
...     optimizers=optimizer,
...     max_duration="1ep",
...     callbacks=[OptimizerMonitor()],
... )

The metrics are logged by the Logger to the following keys as described below. grad_l2_norm and layer_grad_l2_norm are logged in addition to metrics logged by the optimizerโ€™s report_per_parameter_metrics method. For convenience we have listed the metrics logged by DecoupledAdamW below.


Logged data


L2 norm of the gradients of all parameters in the model on the Event.AFTER_TRAIN_BATCH event.


Layer-wise L2 norms


Layer-wise L2 norms of Adam first moment after

calling optimizer step.


Layer-wise L2 norms of parameter weights


Layer-wise L2 norms of the step