Managing Compute#

Databricks Mosaic AI training configures and manages clusters for you automatically.

To view clusters you have access to:

mcli get clusters

View current cluster utilization:

mcli util

Requesting compute resources#

When submitting a run or deployment on a cluster, Databricks Mosaic AI training will try to infer which compute resources to use automatically. Which fields are required depend on which and what type of clusters are available to you or your organization. If those resources are not valid or if there are multiple options still available, an error will be raised on run submissions, and the run will not be created.

Field

Type

Details

gpus

int

Typically required, unless you specify nodes or a cpu-only run

cluster

str

Required if you have multiple clusters

gpu_type

str

Optional

instance

str

Optional. Only needed if the cluster has multiple GPU instances

nodes

int

Optional. Alternative to gpus - typically there are 8 GPUs per node

cpus

int

Optional

For example, you can launch a multi-node cluster my-cluster with 16 A100 GPUs:

compute:
  cluster: my-cluster
  gpus: 16
  gpu_type: a100_80gb

Most compute fields are also optional CLI arguments