MemoryMonitor#

class composer.callbacks.MemoryMonitor[source]#

Logs the memory usage of the model.

This callback calls the torch memory stats API for CUDA (see torch.cuda.memory_stats()) on the Event.AFTER_TRAIN_BATCH and reports different memory statistics.

Example

>>> from composer import Trainer
>>> from composer.callbacks import MemoryMonitor
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     train_dataloader=train_dataloader,
...     eval_dataloader=eval_dataloader,
...     optimizers=optimizer,
...     max_duration="1ep",
...     callbacks=[MemoryMonitor()],
... )

The memory statistics are logged by the Logger to the following keys as described below.

Key	Logged data
`memory/{statistic}`	Several memory usage statistics are logged on `Event.AFTER_TRAIN_BATCH` event.

The following statistics are recorded:

Statistic	Description
alloc_requests	Number of memory allocation requests received by the memory allocator.
free_requests	Number of memory free requests received by the memory allocator.
allocated_mem	Amount of allocated memory in bytes.
active_mem	Amount of active memory in bytes at the time of recording.
inactive_mem	Amount of inactive, non-releaseable memory in bytes at the time of recording.
reserved_mem	Amount of reserved memory in bytes at the time of recording.
alloc_retries	Number of failed cudaMalloc calls that result in a cache flush and retry.

Note

Memory usage monitoring is only supported for GPU devices.