Mosaic AI CLI & SDK Documentation#
Custom Training: Using llm-foundry or your own custom image and training code, pretrain, finetune, and evaluate models with maximum flexibility. This is the most powerful and flexible way to train models on Mosaic AI.
Pretraining API (in preview): Train DBRX from scratch with ease using our Pretraining API. This is the only way to train DBRX from scratch on Mosaic AI using speedups built while training DBRX
Finetuning API: Confidently finetune and adapt models with our Finetuning API. Less flexible than Custom Training, but includes a large set of prebuilt models and training configurations that βjust workβ.
Key features#
π Easily scale training across multiple nodes:
mcli run -f gpt_70b.yaml --gpus 256
β Direct jobs across multiple clouds with a single flag.
> mcli get clusters
NAME PROVIDER GPU_TYPES_AND_NUMS
onprem-oregon MosaicML a100_40gb: [1, 2, 4, 8, 16, 32, 64, 128]
none (CPU only): [0]
aws-us-west-2 AWS a100_80gb: [1, 2, 4, 8, 16]
none (CPU only): [0]
aws-us-east-1 AWS a100_40gb: [1, 2, 4, 8, 16]
none (CPU only): [0]
oracle-sjc OCI a100_40gb: [1, 2, 4, 8, 16, 32, 64, 128, 256]
none (CPU only): [0]
mcli run -f gpu_30b.yaml --gpus 64 --cluster oracle-sjc
π Fully featured python API. Build advanced workflows for your team.
from mcli import wait_for_run_status, Run, RunConfig, RunStatus, create_run
from time import sleep
def monitor_run(run: Run, max_retries: int):
"""Monitor and resubmit failed runs for automatic resumption."""
num_retries = 0
while wait_for_run_status(run, RunStatus.COMPLETED).result():
if run.status == RunStatus.FAILED:
num_retries += 1
if num_retries > max_retries:
raise RuntimeError('Exceeded maximum number of retries')
run = run.clone()
print(f'Failure detected, resubmitting new run: {run.name}')
else:
print(f'Run {run.name} completed successfully with status {run.status}')
break
config = RunConfig.from_file('resnet50.yaml')
run = create_run(config)
monitor_run(run, max_retries=5)
We support integrations with all your favorite tooling: Git, Weights & Biases, CometML, and more!
About Us#
The mission of Databricks Mosaic AI is to make training and tuning of large AI models accessible. We continually productionize state-of-the-art research on efficient model training and study the combinations of these methods in order to ensure that model training is β¨ as efficient as possible β¨
If you have questions, please feel free to reach out to us on Twitter, Email, or join our Slack channel!