composer.utils.collect_env#

Helpers to gather system information for debugging and bug reporting.

Leverages PyTorchโ€™s torch.utils.collect_env package to gather pertinent system information. The following information is additionally collected to faciliate Comopser specific debug:

  • Composer version

  • Number of nodes

  • Host processor model name

  • Host processor physical core count

  • Number of accelerators per node

  • Accelerator model name

This package can be invoked as a standalone console script or can be invoked from within an application to gather and generate a system environment report.

The module can be invoked by using the entrypoint alias:

$ composer_collect_env

Or manually as a standalone script:

$ python composer/utils/collect_env.py

To generate a system report from within a user application see print_env().

A custom excepthook wrapper is also provided which extends the original sys.excepthook() to automatically collect system information when an exception is raised.

To override the original sys.excepthook() see configure_excepthook().

By default, the Composer custom excepthook automatically generates the environment report. To disable automatic environment report generation, use the disable_env_report() helper function. Report generation can be re-enabled by using the enable_env_report() function.

Functions

configure_excepthook

Collect and print system information when sys.excepthook() is called.

disable_env_report

Disable environment report generation on exception.

enable_env_report

Enable environment report generation on exception.

print_env

Generate system information report.

class composer.utils.collect_env.ComposerEnv(composer_version, node_world_size, host_processor_model_name, host_processor_core_count, local_world_size, accelerator_model_name, cuda_device_count)[source]#

Bases: tuple

composer.utils.collect_env.ComposerEnv

composer.utils.collect_env.configure_excepthook()[source]#

Collect and print system information when sys.excepthook() is called.

The custom exception handler causes an exception message to be printed when sys.excepthook() is called. The exception message provides the user with information on the nature of the exception and directs the user to file GitHub issues as appropriate.

By default, the custom exception handler also generates an environment report users can attach to bug reports. Environment report generation can be optionally enabled/disabled by using the enable_env_report() and disable_env_report() helper functions, respectively.

Additioanlly, the custom exceptionhook checks if the user is running from an IPython session and sets up the custom exception handler accordingly.

To override the default sys.excepthook() with the custom except hook:

>>> configure_excepthook()
>>> sys.excepthook
<function _custom_exception_handler at ...>
composer.utils.collect_env.disable_env_report()[source]#

Disable environment report generation on exception.

composer.utils.collect_env.enable_env_report()[source]#

Enable environment report generation on exception.

composer.utils.collect_env.get_accel_model_name()[source]#

Query the accelerator name.

composer.utils.collect_env.get_composer_env()[source]#

Query Composer pertinent system information.

composer.utils.collect_env.get_composer_version()[source]#

Query the Composer version.

composer.utils.collect_env.get_cuda_device_count()[source]#

Get the number of CUDA devices on the system.

composer.utils.collect_env.get_host_processor_cores()[source]#

Determines the number of physical host processor cores.

composer.utils.collect_env.get_host_processor_name()[source]#

Query the host processor name.

composer.utils.collect_env.get_local_world_size()[source]#

Determines the number of accelerators per node.

composer.utils.collect_env.get_node_world_size()[source]#

Query the number of nodes.

composer.utils.collect_env.get_torch_env()[source]#

Query Torch system environment via torch.utils.collect_env.

composer.utils.collect_env.print_env(file=None)[source]#

Generate system information report.

Example: .. code-block:: python

from composer.utils.collect_env import print_env

print_env()

Sample Report:

---------------------------------
System Environment Report
Created: 2022-04-27 00:25:33 UTC
---------------------------------

PyTorch information
-------------------
PyTorch version: 1-91+cu111
Is debug build: False
CUDA used to build PyTorch: 111
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.27

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3080
GPU 1: NVIDIA GeForce RTX 3080
GPU 2: NVIDIA GeForce RTX 3080
GPU 3: NVIDIA GeForce RTX 3080

Nvidia driver version: 470.57.02
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.9.1+cu111
[pip3] torch-optimizer==0.1.0
[pip3] torchmetrics==0.7.3
[pip3] torchvision==0.10.1+cu111
[pip3] vit-pytorch==0.27.0
[conda] Could not collect


Composer information
--------------------
Composer version: 0.7.0
Host processor model name: AMD EPYC 7502 32-Core Processor
Host processor core count: 64
Number of nodes: 1
Accelerator model name: NVIDIA GeForce RTX 3080
Accelerators per node: 1
CUDA Device Count: 4
Parameters

file (TextIO, optional) โ€“ File handle, sys.stdout or sys.stderr. Defaults to sys.stdout.