Set up your environment#
Setting up the environment for your code to run is easily configurable.
Secrets#
Secrets are credentials or other sensitive information used to configure access to a variety of services. Secrets can enable you to:
Access a private docker image
Access a private github repo
Configure API keys, for example, and API key from Weights and Biases for experiment tracking or from Databricks Mosaic AI training to launch runs within runs
Access storage: AWS S3, GCP, OCI, Coreweave, Cloudflare
All secrets are stored securely in a vault, maintained across your clusters, and added to every run and deployment. Your secrets are never shared with other users.
For more information, see the Secrets Page
Docker#
Build a docker image with all the required system packages for your code. Especially for large dependencies, including them in your docker will speed up the run start time. For more information, see the Docker documentation.
We maintain a set of public docker images for PyTorch, PyTorch Vision, and Composer on DockerHub.
To run with an existing docker image, use the image
field:
image: mosaicml/composer:latest
from mcli import RunConfig
config = RunConfig(image='mosaicml/composer:latest',
...)
Private images require setting up Docker Secrets with:
mcli create secrets docker
Environment Variables#
Create your own#
To add non-sensitive environment variables, use the env_variables
field in your YAML:
name: using-env-variables
image: bash
env_variables:
FOO: 'Hello World!'
command: |
echo "$FOO"
Runtime Environment Variables#
We automatically set the following environment variables in your run container.
Variable |
Description |
---|---|
|
The network address of the node with rank 0 in the training job |
|
The network port of the node with rank 0 in the training job |
|
The rank of the node the container is running on, indexed at zero |
|
The name of your run as seen in the output of |
|
Identical to |
|
The total number of GPUs being used for the training run |
|
|
|
The path that your run parameters are stored in |
|
The index of the number of times your run has resumed, starting at zero |
|
The total number of nodes the run is scheduled on |
|
The number of GPUs available to the run on each node |
Many integrations and secrets will also set environment variables automatically, for instance aws s3 secrets will set AWS_CONFIG_FILE
and AWS_SHARED_CREDENTIALS_FILE
.
Refer to the secret documentation to learn more