Dependent Deployments#
Experimental
This function may change or be removed in a future mcli release
Dependent Deployments is a framework that allows you to configure a sidecar image inside a training run. This can be useful for tasks such as batch inference or evaluation that require an inference engine for efficient generation and orchestrating large amounts of GPUs
How it works Each run will have the following two containers per node:
Main container
image
that executescommand
Sidecar model container using
dependent_deployments.image
image that executesdependent_deployment.image
When the run is started, both images are pulled and loaded in separate containers. The command of each is executed in each respective container when the run starts.
Example mcli yaml configuration using vllm image:
name: example
image: mosaicml/composer:latest
compute:
gpus: 8
command: |-
echo 'TODO: Create a script that waits for http://0.0.0.0:8000/v1 to become available, and then queries it'
# # Optional: main run config
# env_variables:
# KEY: VALUE
dependent_deployment:
image: vllm/vllm-openai:latest
model: {}
command: |-
echo 'TODO: a bash command that downloads a model and then launches the server'
# # Optional: dependent_deployment config
# env_variables:
# KEY: VALUE
You can view the logs of the main container via:
mcli logs <run-name> # --rank 0
And view the logs of the sidecar âdependent deploymentâ container via:
mcli logs <run-name> -c model # --rank 0
Note if using for inference, the dependent deployment command must download model weights and start the inference server, and the command in your main container must include logic to sleep until the model container finishes spinning up the server. If either command does not succeed, the run will be marked as âFailedâ