๐โโ๏ธ Run Name#
The run_name
is a string used to name a specific training run. Naming your training run has many benefits.
Namely, you can more easily group and keep track of metrics, checkpoints, and other training artifacts.
In addition, your run_name
will show up in many places as you utilize Composer.
Run Name Creation#
The run_name
argument is an optional argument to the Trainer
.
There are two ways to get a run_name
.
You create your own run_name
and pass it to the trainer, like so:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from composer import Trainer
from composer.models import mnist_model
transform = transforms.Compose([transforms.ToTensor()])
dataset = datasets.MNIST("data", train=True, download=True, transform=transform)
train_dataloader = DataLoader(dataset, batch_size=128)
run_name = 'my-cool-run-name'
trainer = Trainer(
model=mnist_model(num_classes=10),
train_dataloader=train_dataloader,
max_duration="2ep",
run_name=run_name,
)
trainer.fit()
You can instead let the trainer create a run_name
for you. The one created for you is the timestamp followed by a coolname; e.g. โ1657932618-infrared-ferretโ
How the Run Name is Used#
This run_name will be added as an attribute to State
and it is used by various other pieces of the composer infrastructure as described below.
The run_name
is often used in the Composer as placeholder in a format string. This means that if a string is specified to name a file, like '{run_name}-foo-bar'
for example, then that placeholder will get filled in by the actual run_name
at runtime, so the file will actually be named โ1657932618-infrared-ferret-foo-barโ
.
Run Names in Checkpoint Saving#
In checkpoint saving you can use the run_name
as a placeholder in a format string to name the folders and checkpoints locally and in the cloud if you are uploading your checkpoints using Weights and Biases or an RemoteUploaderDownloader.
See CheckpointSaver
for more information on specifying the arguments for files and folder names with the run_name
when creating a Trainer
object.
Run Names in Logging#
In addition to checkpointing, loggers also use the run_name
for default logging.
Experiment Tracking Loggers#
The
TensorboardLogger
will save all the logs for a run to a folder calledrun_name
and the name of each run in the Tensorboard GUI will berun_name
.The
run_name
you specify will be used by theWandBLogger
as the run name for Weights and Biases.The
run_name
you specify will be used by theCometMLLogger
as the run name for your Comet experiment.
Object Store Logger#
The RemoteUploaderDownloader
will often use the run_name
as part of how it names objects.
File Logger#
The run_name
is also used in the FileLogger
as the default name for the file the FileLogger
logs to is '{run_name}/logs-rank{rank}.txt'
See Logging for more information.
Run Names in Profiling#
The profiling tools for the training also save profiling files to folders named after the run_name
. See Performance Profiling for more information.