composer.callbacks.speed_monitor#

Monitor throughput during training.

Classes

SpeedMonitor

Logs the training throughput.

class composer.callbacks.speed_monitor.SpeedMonitor(window_size=100)[source]#

Bases: composer.core.callback.Callback

Logs the training throughput.

The training throughput in terms of number of samples per second is logged on the BATCH_END event if we have reached the window_size threshold. Per epoch average throughput and wall clock train time is also logged on the EPOCH_END event.

Example

>>> from composer.callbacks import SpeedMonitor
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     train_dataloader=train_dataloader,
...     eval_dataloader=eval_dataloader,
...     optimizers=optimizer,
...     max_duration="1ep",
...     callbacks=[SpeedMonitor(window_size=100)],
... )

The training throughput is logged by the Logger to the following keys as described below.

Key	Logged data
`throughput/step`	Rolling average (over `window_size` most recent batches) of the number of samples processed per second
`throughput/epoch`	Number of samples processed per second (averaged over an entire epoch)
`wall_clock_train`	Total elapsed training time

Parameters: window_size (int, optional) – Number of batches to use for a rolling average of throughput. Default to 100.

load_state_dict(state)[source]#

Restores the state of SpeedMonitor object.

Parameters: state (Dict[str, Any]) – The state of the object, as previously returned by state_dict()

state_dict()[source]#

Returns a dictionary representing the internal state of the SpeedMonitor object.

The returned dictionary is pickle-able via torch.save().

Returns: Dict[str, Any] – The state of the SpeedMonitor object