MemoryMonitor#
- class composer.callbacks.MemoryMonitor(memory_keys=None)[source]#
- Logs the memory usage of the model. - This callback calls the torch memory stats API for CUDA (see - torch.cuda.memory_stats()) on the- Event.AFTER_TRAIN_BATCHand reports different memory statistics.- Example - >>> from composer import Trainer >>> from composer.callbacks import MemoryMonitor >>> # constructing trainer object with this callback >>> trainer = Trainer( ... model=model, ... train_dataloader=train_dataloader, ... eval_dataloader=eval_dataloader, ... optimizers=optimizer, ... max_duration="1ep", ... callbacks=[MemoryMonitor()], ... ) - The memory statistics are logged by the - Loggerto the following keys as described below.- Key - Logged data - memory/{statistic}- Several memory usage statistics are logged on - Event.AFTER_TRAIN_BATCHevent.- The following statistics are recorded: - Statistic - Description - allocated_mem - Amount of allocated memory in gigabytes. - active_mem - Amount of active memory in gigabytes at the time of recording. - inactive_mem - Amount of inactive, non-releaseable memory in gigabytes at the time of recording. - reserved_mem - Amount of reserved memory in gigabytes at the time of recording. - alloc_retries - Number of failed cudaMalloc calls that result in a cache flush and retry. - Note - Memory usage monitoring is only supported for GPU devices.