Classes

 GradMonitor Computes and logs the L2 norm of gradients on the Event.AFTER_TRAIN_BATCH event.

Computes and logs the L2 norm of gradients on the Event.AFTER_TRAIN_BATCH event.

L2 norms are calculated after the reduction of gradients across GPUs. This function iterates over the parameters of the model and may cause a reduction in throughput while training large models. In order to ensure the correctness of the norm, this function should be called after gradient unscaling in cases where gradients are scaled.

Example

>>> from composer import Trainer
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     optimizers=optimizer,
...     max_duration="1ep",
... )


The L2 norms are logged by the Logger to the following keys as described below.

Key

Logged data

grad_l2_norm/step

L2 norm of the gradients of all parameters in the model on the Event.AFTER_TRAIN_BATCH event.

layer_grad_l2_norm/LAYER_NAME

Layer-wise L2 norms if log_layer_grad_norms is True. Default: False.

Parameters

log_layer_grad_norms (bool, optional) – Whether to log the L2 normalization of each layer. Default: False.