Set up your environment#

Setting up the environment for your code to run is easily configurable in the MosaicML platform.

Secrets#

Secrets are credentials or other sensitive information used to configure access to a variety of services. Secrets can enable you to:

All secrets are stored securely in a vault, maintained across your clusters, and added to every run and deployment. Your secrets are never shared with other users.

For more information, see the Secrets Page

Docker#

Build a docker image with all the required system packages for your code. Especially for large dependencies, including them in your docker will speed up the run start time. For more information, see the Docker documentation.

We maintain a set of public docker images for PyTorch, PyTorch Vision, and Composer on DockerHub.

To run with an existing docker image, use the image field:

image: mosaicml/composer:latest
from mcli import RunConfig
config = RunConfig(image='mosaicml/composer:latest',
                   ...)

Docker Tags

We strongly recommend using a fixed tag instead of latest for docker images to ensure reproducibility. Create and use versioned tag names (e.g. v1.7.0) for your docker images.

Private images require setting up Docker Secrets with:

mcli create secrets docker

Environment Variables#

Create your own#

To add non-sensitive environment variables, use the env_variables field in your YAML:

name: using-env-variables
image: bash
env_variables:
  FOO: 'Hello World!'
command: |
  echo "$FOO"

MosaicML Platform Environment Variables#

We automatically set the following environment variables in your run container.

Variable

Description

MASTER_ADDR

The network address of the node with rank 0 in the training job

MASTER_PORT

The network port of the node with rank 0 in the training job

NODE_RANK

The rank of the node the container is running on, indexed at zero

RUN_NAME

The name of your run as seen in the output of mcli get runs

COMPOSER_RUN_NAME

Identical to RUN_NAME, used by composer

WORLD_SIZE

The total number of GPUs being used for the training run

MOSAICML_PLATFORM

true if you are using the MosaicML Platform, used by composer

PARAMETERS

The path that your run parameters are stored in

RESUMPTION_INDEX

The index of the number of times your run has resumed, starting at zero

NUM_NODES

The total number of nodes the run is scheduled on

LOCAL_WORLD_SIZE

The number of GPUs available to the run on each node

Many integrations and secrets will also set environment variables automatically, for instance aws s3 secrets will set AWS_CONFIG_FILE and AWS_SHARED_CREDENTIALS_FILE. Refer to the secret documentation to learn more