Classes

 GradMonitor Computes and logs the L2 norm of gradients on the AFTER_TRAIN_BATCH event.

Computes and logs the L2 norm of gradients on the AFTER_TRAIN_BATCH event.

L2 norms are calculated after the reduction of gradients across GPUs. This function iterates over the parameters of the model and hence may cause a reduction in throughput while training large models. In order to ensure the correctness of norm, this function should be called after gradient unscaling in cases where gradients are scaled.

Example

>>> from composer.callbacks import GradMonitor
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     optimizers=optimizer,
...     max_duration="1ep",
... )


The L2 norms are logged by the Logger to the following keys as described below.

Key

Logged data

grad_l2_norm/step

L2 norm of the gradients of all parameters in the model on the AFTER_TRAIN_BATCH event

layer_grad_l2_norm/LAYER_NAME

Layer-wise L2 norms if log_layer_grad_norms is True (default False)

Parameters

log_layer_grad_norms (bool, optional) – Whether to log the L2 normalization of each layer. Defaults to False.