Dependent Deployments#

Experimental

This function may change or be removed in a future mcli release

Dependent Deployments is a framework that allows you to configure a sidecar image inside a training run. This can be useful for tasks such as batch inference or evaluation that require an inference engine for efficient generation and orchestrating large amounts of GPUs

How it works Each run will have the following two containers per node:

Main container image that executes command
Sidecar model container using dependent_deployments.image image that executes dependent_deployment.image

When the run is started, both images are pulled and loaded in separate containers. The command of each is executed in each respective container when the run starts.

Example mcli yaml configuration using vllm image:

name: example
image: mosaicml/composer:latest
compute:
    gpus: 8
command: |-
  echo 'TODO: Create a script that waits for http://0.0.0.0:8000/v1 to become available, and then queries it'

# # Optional: main run config
# env_variables:
#     KEY: VALUE

dependent_deployment:
  image: vllm/vllm-openai:latest
  model: {}
  command: |-
    echo 'TODO: a bash command that downloads a model and then launches the server'

# # Optional: dependent_deployment config
#   env_variables:
#     KEY: VALUE

You can view the logs of the main container via:

mcli logs <run-name> # --rank 0

And view the logs of the sidecar “dependent deployment” container via:

mcli logs <run-name> -c model # --rank 0

Note if using for inference, the dependent deployment command must download model weights and start the inference server, and the command in your main container must include logic to sleep until the model container finishes spinning up the server. If either command does not succeed, the run will be marked as “Failed”