OptimizerMonitor#
- class composer.callbacks.OptimizerMonitor(log_optimizer_metrics=True, batch_log_interval=1)[source]#
- Computes and logs the L2 norm of gradients as well as any optimizer-specific metrics implemented in the optimizerโs report_per_parameter_metrics method. - L2 norms are calculated after the reduction of gradients across GPUs. This function iterates over the parameters of the model and may cause a reduction in throughput while training large models. In order to ensure the correctness of the norm, this function should be called after gradient unscaling in cases where gradients are scaled. - Example - >>> from composer import Trainer >>> from composer.callbacks import OptimizerMonitor >>> # constructing trainer object with this callback >>> trainer = Trainer( ... model=model, ... train_dataloader=train_dataloader, ... eval_dataloader=eval_dataloader, ... optimizers=optimizer, ... max_duration="1ep", ... callbacks=[OptimizerMonitor()], ... ) - The metrics are logged by the - Loggerto the following keys as described below. grad_l2_norm and layer_grad_l2_norm are logged in addition to metrics logged by the optimizerโs report_per_parameter_metrics method. For convenience we have listed the metrics logged by DecoupledAdamW below.- Key - Logged data - l2_norm/grad/global- L2 norm of the gradients of all parameters in the model on the - Event.AFTER_TRAIN_BATCHevent.- l2_norm/grad/LAYER_NAME- Layer-wise L2 norms - l2_norm/moment/LAYER_NAME- Layer-wise L2 norms of Adam first moment after
- calling optimizer step. 
 - l2_norm_ratio/moment_grad/LAYER_NAME- Layer-wise ratio of the gradient norm to the moment norm after calling optimizer step. - cosine/moment_grad/LAYER_NAME- Layer-wise cosine angle between gradient and moment after calling optimizer step. - l2_norm/param/LAYER_NAME- Layer-wise L2 norms of parameter weights - l2_norm/second_moment_sqrt/LAYER_NAME- Layer-wise L2 norms of the square root
- of the Adam second moment is. 
 - l2_norm/update/LAYER_NAME- Layer-wise L2 norms of the step - cosine/update_grad/LAYER_NAME- Layer-wise cosine between the gradient and the step - l2_norm_ratio/update_param/LAYER_NAME- Layer-wise ratio between step size and parameter norm