Looking for the latest release notes? See v0.5.x


  • Fix bug for first time mcli set api-key calls


  • Adds max_retries default value 10 when watchdog is enabled

  • Patches a small bug in the describe display column ordering for a failed run


  • Adds 🐕 to runs in CLI display for runs with watchdog turned on

  • Add max retry checking to watchdog


  • Patch update_run import bug

  • Add connect functionality to mcli interactive


  • New create_interactive_run API to launch interactive runs

  • Better error handling for inference deployments

  • Small improvements to documentation


  • Update run functionality via Run.update(..), update_run('name', ...), and mcli update run

  • Additional deployment update functionality via mcli update deployment

  • Improved ping and predict error handling

  • Run resumption override support mcli run -r run-name --priority low

  • Documentation updates


  • Revamp of mcli describe run: now shows detailed information about run resumptions

  • mcli watchdog command to support automatic run submission on failure

  • mcli get deployments command shows inference deployment replica count


  • Support for compute field in inference deployments to allow selecting specific gpu instances

  • Dynamic batching config for inference deployments

  • Updated docs for inference deployments


  • Split clusters by submission type (training or inference)

  • Attempts renamed to resumptions


  • Schema validation JSON for run yamls


  • Add SSL certificate warning for MacOS


  • Allow inputting just the deployment name for ping and predict commands


  • Support retrieving logs for all run attempts


  • Added sdk support for updating an InferenceDeployment



  • mcli logs defaults to the first failed rank if the run has failed

  • New singleton inference SDK functions: get_inference_deployment, delete_inference_deployment

  • Adds new methods to InferenceDeployment

d = get_inference_deployment('foo')

print(f'{d.status} before')
d = d.refresh()
print(f'{d.status} after')

status = d.ping()
output = d.predict(input)



What’s Changed

  • Small patch to fix breaking change to mcli get deployment logs in 0.4.1


Backwards breaking changes

  • We are replacing mcli init-kube with mcli kube get-config and mcli kube merge-config

What’s Changed

  • Ping: Don’t throw error with empty content

  • --name is not required to delete deployment

  • Fix ping to actually return status code


Backwards breaking changes

  • Deprecation of LEGACY mode and all associated code and dependencies (including kubernetes!) - removes 16,500 lines of code from mcli! 🔥🔥

  • Remove positional arguments for SDK filters

get_runs(["run-name1", "run-name2"], ["cluster1", "cluster2"])
# TypeError: get_runs() takes from 0 to 1 positional arguments but 2 were given

get_runs(["run-name1", "run-name2"], cluster_names=["cluster1", "cluster2"])
# OK! 🙆‍♀️

Summary of changes since 0.3.0

  • MCLI can now be imported directly:

from mcli import get_runs, ...

from mcli.sdk import get_runs # don’t worry, this still works!

  • MosaicML Inference!

  • Run resumption and preemption

  • Run metadata

  • Shared runs

  • Better describe run data and cluster specifications

  • MCLI can now be imported directly:

from mcli import get_runs, ...

from mcli.sdk import get_runs # don't worry, this still works!