SpeedMonitor#
- class composer.callbacks.SpeedMonitor(window_size=100, gpu_flops_available=None, time_unit='hours')[source]#
Logs the training throughput and utilization.
The training throughput is logged on the
Event.BATCH_END
event once we have reached the window_size threshold. If a model has flops_per_batch attribute, then flops per second is also logged. If running on a known GPU type or if gpu_flops_available is set, then MFU is also logged. All metrics are also logged as per device by dividing by world size.To compute flops_per_sec, the model attribute flops_per_batch should be set to a callable which accepts a batch and returns the number of flops for that batch. Typically, this should be flops per sample times the batch size unless pad tokens are used.
The wall clock time is logged on every
Event.BATCH_END
event.Example
>>> from composer import Trainer >>> from composer.callbacks import SpeedMonitor >>> # constructing trainer object with this callback >>> trainer = Trainer( ... model=model, ... train_dataloader=train_dataloader, ... eval_dataloader=eval_dataloader, ... optimizers=optimizer, ... max_duration='1ep', ... callbacks=[SpeedMonitor(window_size=100)], ... )
The training throughput is logged by the
Logger
to the following keys as described below.Key
Logged data
throughput/batches_per_sec
Rolling average (over window_size most recent batches) of the number of batches processed per second
throughput/samples_per_sec
Rolling average (over window_size most recent batches) of the number of samples processed per second
throughput/tokens_per_sec
Rolling average (over window_size most recent batches) of the number of tokens processed per second. Only logged if dataspec returns tokens per batch
throughput/flops_per_sec
Estimates flops by flops_per_batch * batches_per_sec if model has attribute flops_per_batch
throughput/device/batches_per_sec
throughput/batches_per_sec divided by world size
throughput/device/samples_per_sec
throughput/samples_per_sec divided by world size
throughput/device/tokens_per_sec
throughput/tokens_per_sec divided by world size. Only logged if dataspec returns tokens per batch
throughput/device/flops_per_sec
throughput/flops_per_sec divided by world size. Only logged when model has attribute flops_per_batch
throughput/device/mfu
throughput/device/flops_per_sec divided by flops available on the GPU device. Only logged when model has attribute flops_per_batch and gpu_flops_available, which can be passed as an argument if not automatically determined by SpeedMonitor
time/train
Total elapsed training time
time/val
Total elapsed validation time
time/total
Total elapsed time (time/train + time/val)
- Parameters
window_size (int, optional) โ Number of batches to use for a rolling average of throughput. Defaults to 100.
gpu_flops_available (float, optional) โ Number of flops available on the GPU. If not set, SpeedMonitor will attempt to determine this automatically. Defaults to None.
time_unit (str, optional) โ Time unit to use for time logging. Can be one of โsecondsโ, โminutesโ, โhoursโ, or โdaysโ. Defaults to โhoursโ.