Looking for the latest release notes? See v0.6.x


Bugfix where experimentTracker was not returned on createFinetune


  • Additional finetuning docs

  • Update finetuning config to use experimentTracker instead of experimentTrackers

  • Remove --follow from mcli finetune

  • Remove generic 500s from MAPI retries

  • Small bug fix on interactive event loops


  • Donโ€™t strip strings when printing BaseSubmissionConfig

  • Add new list_finetuning_events function

  • Fix update_run_metadata to serialize input data or ignore the data if it is not serializable


  • Documentation updates


  • Finetuning docs updates

  • Improvements to mcli describe ft display:

    • Rename Reason column to Details

    • Hide Details when null


  • Use runType filter instead of isInteractive to fetch interactive runs

  • Report credentials check failures as โ€˜Failedโ€™

  • Documentation updates

  • Fix stop run name filter bugs

  • Fix estimated end time on display


  • Updated documentation for Huggingface on mcli

  • mcli describe ft now shows original submitted yaml


  • Add mcli stop and delete finetuning runs

  • Update formatting for mcli describe ft


  • Add optional --container flag to mcli logs to filter logs by container name


  • Docs updates around managed mlflow.

  • Move errors not in that donโ€™t get retried to debug mode.


  • Add retry and an optional bool to protect from SIGTERMs

  • Add dependent deployments for eval.



    • mcli finetune sdk now returns a distinct FineTune object as opposed to a Run object

    • corresponding documentation updates


  • Add node_name to the MCLI admin support command


  • Add gpu_type to response for cluster utliization


  • Add support for UC volumes as data input for the Finetuning API

  • Modifies the cluster utilization type to support MVP version of serverless scheduling for runs

  • Adds new Run field parent_name to support tracking for runs that spawn child runs


  • Fix S3 secret bug

  • Error out for unknown finetuning fields


  • Sort run metadata keys by value not length in mcli describe run

  • Fix bug with watchdog retry logic



    • The task_type for instruction finetuning has changed from IFT to INSTRUCTION_FINETUNE

    • Removed instruction_finetune function. Please use the finetune function with task_type="INSTRUCTION_FINETUNE"


  • Bugfix for invalid profile error when creating s3 secrets


  • You can now use mcli log <run> --tail N to get the latest N log lines from your run

  • Added support for โ€œContinued Pretrainingโ€ to mcli finetune

  • Added Databricks secret type


  • Fixed S3 secrets created without explicit profiles

  • Run reason autopopulates from latest attempt

  • Support for HF secrets


  • Added Reason column to mcli util displaying the reason for pending or queued runs under the queued training runs section.


  • Fix for describe run

  • Update to finetuning API


  • Adds alias mcli util -t for mcli util โ€“training

  • Fixes bug introduce in mcli describe run from 0.5.10

  • Fixes bug in ETA in mcli get runs


  • Finetuning compute updates

  • Fix describe run bugs

  • Add estimated end time to commands

  • Display cpu information in describe cluster


  • Doc updates for finetuning

  • Added --yaml-only flag for runs

  • Fixed timestamp bug.


  • Added mcli stop deployment as well as sdk support for it

  • Added code evaluation

  • Allow mcli-admin to be permissioned to stop and restart other userโ€™s runs

  • Add override_eval_data_path and tests for finetuning

  • Fix --resumption and --prev flags for mcli logs


  • Add cluster and GPU information to mcli connect

  • Validate tty to accept StringIO stdin for mcli interactive

  • Finetuning docs updates

  • Inference no-code docs updates


  • New create_default_deployment & get_cluster SDK functions

  • Increase default predict timeout to 60 seconds

  • Add pip command for how to upgrade mcli

  • Show node statuses in mcli describe run view

  • By default mcli get runs shows only the latest 3 resumptions

  • Max duration improvements

  • Improved mcli get deployments view

  • Add max_batch_size_in_bytes to BatchingConfig


  • rateLimit can be specified in submission YAMLs

  • Visual improvements to get runs and describe runs output

  • Initial finetuning support


  • Small version change for finetuning API


  • Improved mcli get deployments with full and --compact mode

  • Filter mcli util using --training and --inference arguments

  • Early release of finetuning API (subject to change)


  • Add mcli describe cluster support to show a detailed view with cluster information


  • Add --max-duration flag for creating and updating runs

  • Add ability to view the logs of the latest failed replica with mcli get deployment logs --failed


This page includes information on updates and features related to MosaicML CLI and SDK. The first sections cover general features and deprecation notices, and then features specific for the training and inference products respectively


CLI AutoComplete#

We now support tab autocomplete in bash and zsh shells! To enable, run:

eval "$(register-python-argcomplete mcli)"

Deprecation & breaking changes#

  • ClusterUtilization object retuned fromget_clusters(utilization=True):

    • active_by_user is now active_runs_by_user

    • queued_by_user is now queued_runs_by_user

    • ClusterUtilizationByRun now has name columns instead of run_name

  • Deprecated two RunStatus values:

    • FAILED_PULL - This is reported as RunStatus.FAILED with reason FailedImagePull

    • SCHEDULED - This is synonymous with RunStatus.QUEUED

New training features#

First-class watchdog (๐Ÿ•) support#

Hero run users are well-familiar with our watchdog script, which autoresumes your run given an system failure with a python script within a yaml.

๐Ÿš€ย Now, we are launching first-class support for watchdog! ๐Ÿš€

# Enable watchdog for an existing run
mcli watchdog <run>

# Disable watchdog for an existing run
mcli watchdog <run> --disable

Youโ€™re still able to configure resumable: True in your yaml if youโ€™d like to launch watchdog at the start. Also, see autoresume within Composer for a fully managed autoresumption experience from the last checkpoint.

If watchdog was configured for your run, youโ€™ll see a ๐Ÿ•ย icon next to your run_name in the mcli get runs display.

NAME                       USER                CREATED_TIME         STATUS    START_TIME           END_TIME             CLUSTER  INSTANCE      NODES
finetune-mpt-7b-bZOcnU ๐Ÿ•  [email protected]  2023-07-20 05:34 PM  Completed  2023-07-20 05:35 PM  2023-07-20 05:47 PM r1z1     8x a100_80gb  1

By default, enabling watchdog will automatically retry your run 10 times. You can configure this default in your yaml by overriding the max_retries scheduling parameter:

  resumable: True
  max_retries: 5

(Preview) Interactive runs#

Interactive runs give the ability to debug and iterate quickly inside your cluster in a secure way. Interactivity works on top of the existing MosaicML runs and adds two new CLI commands:

# Submit a new run entirely for interactive debugging
mcli interactive --hours 1 --gpus 1 --tmux --cluster <cluster-name>

# Connect to an existing run (either launched via mcli run or mcli interactive)
mcli connect --tmux <run-name>

You can find the the full docs and details about connected using VSCode here.

Improved resumption UX#

If your run autoresumes on our platform, youโ€™ll see a new view when fetching runs that displays high-level information on the multiple resumptions:

> mcli get runs

NAME                         USER                CREATED_TIME         RESUMPTION  STATUS     START_TIME           END_TIME             CLUSTER  INSTANCE      NODES
long-run-GfeqDT              [email protected]  2023-06-21 05:36 PM  1           Completed  2023-06-21 05:36 PM  2023-06-21 06:36 PM  r1z1     cpu           1
                                                                      0           Stopped    2023-06-21 05:36 PM  2023-06-21 05:37 PM  r1z1     cpu           1

Run resumptions are listed in descending order so you can focus on the latest resumption by default. Also, resumptions for a single run are also grouped visually for comparison.

We also improved the describe view to easily visualize different resumptions of your run, their run states, and their duration. Thereโ€™s also a handy Event Log section that details when states changed within your overall run.

> mcli describe run

Run Lifecycle
Resumption 1:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Pending โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Running โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€ Completed โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ At: 2023-06-21 05:36 PM โ”‚ โ”‚ At: 2023-06-21 05:36 PM โ”‚ โ”‚ At: 2023-06-21 06:36 PM โ”‚
โ”‚ For: 5s                 โ”‚ โ”‚ For: 1hr                โ”‚ โ”‚                         โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Initial Run:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Pending โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Running โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Stopped โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ At: 2023-06-21 05:36 PM โ”‚ โ”‚ At: 2023-06-21 05:36 PM โ”‚ โ”‚ At: 2023-06-21 05:37 PM โ”‚
โ”‚ For: 7s                 โ”‚ โ”‚ For: 1min               โ”‚ โ”‚ For: 0s                 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Number of Resumptions: 2
Total time spent in Pending: 12s
Total time spent in Running: 1hr

                                     Event Log
โ”ƒ  Time                 โ”ƒ  Resumption  โ”ƒ  Event                                    โ”ƒ
โ”‚  2023-06-21 05:36 PM  โ”‚  0           โ”‚  Run created                              โ”‚
โ”‚  2023-06-21 05:36 PM  โ”‚  0           โ”‚  Run started                              โ”‚
โ”‚  2023-06-21 05:36 PM  โ”‚  1           โ”‚  Run placed back in the scheduling queue  โ”‚
โ”‚  2023-06-21 05:36 PM  โ”‚  1           โ”‚  Run resumed                              โ”‚
โ”‚  2023-06-21 06:36 PM  โ”‚  1           โ”‚  Run completed successfully               โ”‚

Update scheduling properties of a run#

The scheduling configurations of a run can be updated. This includes:

  • priority: Update the default priority of the run from auto to low or lowest

  • preemptible: Update whether the run can be stopped and re-queued by higher priority jobs; default is False

  • max_retries: Update the max number of times the run can be retried; default is 0

# Update run scheduling fields
mcli update run example-run --preemptible true --max-retries 10 --priority medium
from mcli import get_run

my_run = get_run("example-run")
updated_run = my_run.update(preemptible=True, max_retries=10, priority='medium')

All run parameters can also be updated when cloning:

# Update any parameters when cloning a run
mcli clone <run-name> --gpus 10 --priority low

This can also be done in the SDK:

from mcli import get_run

my_run = get_run("example-run")
new_run = my_run.clone(gpus=10, priority='low')

New inference features#

Batching support#

Batching config allows the user to specify the max batch size and timeout for inference request processing:

  max_batch_size: 4
  max_timeout_ms: 3000

View utilization of inference clusters#

mcli util now shows inference usage:

Inference Instances:
r7z14  oci.vm.gpu.a10.2  2xa10      1               1          2
			 oci.vm.gpu.a10.1  1xa10      0               4          4

Active Inference Deployments:
DEPLOYMENT_NAME                          USER                          AGE   GPUS
mpt-7b-u0qtof                            [email protected]              4d    1
mpt-30b-vl5mrp                           [email protected]             5d    2
mpt-7b-test-9sc4ta                       [email protected]              21hr  3

Queued Inference Deployments:
No items found.

As shown above, mcli util now also shows gpu instance names along with node info! This is because there are now nodes in our clusters with gpu instances that are only different by the number of gpus per instance. This will make it easier for customers to ask for their deployment or training run to land on the specified instance that they want from their yaml.

Customizable compute resource requests#

Compute specifications can now be configured with the compute field in deployment yamls:

  cluster: my-cluster
  gpus: 4
  instance: oci.vm.gpu.a10.2

Update properties of a deployment#

After youโ€™ve created an inference deployment, you can easily update a few configurations with:

mcli update deployment <deployment_name> --replicas 2 --image "new_image"

Thereโ€™s also a handy SDK command for updating your deployment:

from mcli import update_inference_deployments

update_inference_deployments(['name-of-deployment'], {'replicas': 2, 'image': 'new_image'})