๐Ÿ›• File Uploading#

Composer supports uploading files, such as checkpoints and profiling traces, directly to third-party experiment trackers (e.g. Weights & Biases) and cloud storage backends (e.g. AWS S3).

What files might I want to upload?#

Checkpoints, profiling traces, and log files generated during training are the most common examples. Each file to upload must be a single, local file. Collections of files can be combined into a single tarball, and a file can be stored in a temporary folder.

Each remote file must have a name, which is independent of the fileโ€™s local filepath. A remote backend is responsible for storing and organizing the file by the fileโ€™s name. A remote file with the same name should override a previous remote file with that name. It is recommended that remote file names include file extensions.

How are remote files generated?#

In Composer, individual classes, such as algorithms, callbacks, loggers, and profiler trace handlers, can generate files to be uploaded.

Once a file has been written to disk, the class should call upload_file(), and the centralized Logger will then pass the filepath and remote file name to all LoggerDestinations, which are ultimately responsible for uploading and storing remote files (more on that below).

Below are some examples of the classes that generate files that might be uploaded and the types of files they generate. For each class, see the linked API Reference for additional documentation.

Type

Class Name

Description of Generated Files

Callback

CheckpointSaver

Training checkpoint files

Callback

ExportForInferenceCallback

Trained models in inference formats

Callback

MLPerfCallback

MLPerf submission files

Logger

FileLogger

Log files

Logger

TensorboardLogger

Tensorboard TF Event Files

Trace Handler

JSONTraceHandler

Profiler trace files

Saving custom files#

It is also possible to upload custom files outside of an algorithm or callback. For example:

from composer import Trainer

# Construct the trainer
trainer = Trainer(...)

# Upload a custom file, such as a configuration YAML
trainer.logger.upload_file(
    remote_file_name='hparams.yaml',
    file_path='/path/to/hparams.yaml',
)

# Train!
trainer.fit()

How are files uploaded?#

To store files remotely, in the loggers argument to the Trainer constructor, you must specify a LoggerDestination that implements the upload_file().

See also

The built-in WandBLogger, RemoteUploaderDownloader implement this method โ€“ see the examples below.

The centralized Composer Logger will invoke this method for all LoggerDestinations. If no LoggerDestination implements this method, then files will not be stored remotely.

Because LoggerDestinations can both generate and store files, there is a potential for a circular dependency. As such, it is important that any logger that generates files that are going to be uploaded (e.g. the Tensorboard Logger) does not also attempt to upload them. Otherwise, you could run into an infinite loop!

Where can I remotely store files?#

Composer includes three built-in LoggerDestinations to store artifacts:

  • The WandBLogger can upload Composer training files as W & B Artifacts, which are associated with the corresponding W & B project.

  • The NeptuneLogger can upload Composer training files as Neptune Files, which are associated with the corresponding Neptune run.

  • The RemoteUploaderDownloader can upload Composer training files to any cloud storage backend or remote filesystem. We include integrations for AWS S3 and SFTP (see the examples below), and you can write your own integration for a custom backend.

Why should I use built in file uploading instead of uploading files manually?#

File uploading in Composer is optimized for efficiency. File uploads happen in background threads or processes, ensuring that the training loop is not blocked due to network I/O. In other words, this feature allows you to train the next batch while the previous checkpoint is being uploaded simultaneously.

Examples#

Below are some examples on how to configure Composer to upload files to various backends:

Weights & Biases Artifacts#

See also

The WandBLogger API Reference.

from composer.loggers import WandBLogger
from composer import Trainer

# Configure the logger
logger = WandBLogger(
    log_artifacts=True,  # enable artifact logging
)

# Define the trainer
trainer = Trainer(
    ...,
    loggers=logger,
)

# Train!
trainer.fit()

Neptune File upload#

See also

The NeptuneLogger API Reference.

from composer.loggers import NeptuneLogger
from composer import Trainer

# Configure the Neptune logger
logger = NeptuneLogger(
    upload_checkpoints=True,  # enable logging of checkpoint files
)

# Define the trainer
trainer = Trainer(..., loggers=logger)

# Train
trainer.fit()

S3 Objects#

To upload files to an S3 bucket, weโ€™ll need to configure the RemoteUploaderDownloader with the S3ObjectStore backend.

See also

The RemoteUploaderDownloader and S3ObjectStore API Reference.

from composer.loggers import RemoteUploaderDownloader
from composer.utils.object_store import S3ObjectStore
from composer import Trainer

# Configure the logger
logger = RemoteUploaderDownloader(
    bucket_uri="s3://my-bucket-name",
)

# Define the trainer
trainer = Trainer(
    ...,
    loggers=logger,
)

# Train!
trainer.fit()

SFTP Filesystem#

Similar to the S3 Example above, we can upload files to a remote SFTP filesystem.

See also

The RemoteUploaderDownloader and SFTPObjectStore API Reference.

from composer.loggers import RemoteUploaderDownloader
from composer.utils.object_store import SFTPObjectStore
from composer import Trainer

# Configure the logger
logger = RemoteUploaderDownloader(
    bucket_uri="sftp://sftp_server.example.com",
)

# Define the trainer
trainer = Trainer(
    ...,
    loggers=logger,
)

# Train!
trainer.fit()