Interactive Runs#
Interactive runs give the ability to debug and iterate quickly inside your cluster in a secure way. Interactivity works on top of the existing MosaicML runs, so before connecting a run workload needs to be submitted to the cluster. For security purposes storage is not persisted, so we recommend utilizing your own cloud storage and git repositories to stream and save data between runs.
Launch an interactive run#
Launching new runs
All runs on reserved clusters can be connected to, regardless of how they were launched.
This section goes over mcli interactive
, which is a helpful alias for creating simple “sleeper” runs for interactive purposes.
You can also create a custom run configuration for interactive purposes through the normal mcli run
entrypoint
Launch an interactive run by running:
mcli interactive --max-duration 1 --gpus 1 --tmux --cluster <cluster-name>
This command creates a “sleeper” run that will last for 1 hour (--max-duration 1
), request 1 GPU (--gpus 1
) and connect to a tmux
session (--tmux
) within your run.
The --max-duration
or --hours
argument is required to avoid any large, accidental charges from a forgotten run.
The --tmux
argument is strongly recommended to allow your session to persist through any temporary disconnects.
mcli
will automatically try to reconnect you to your run whenever you disconnect, so utilizing tmux
dramatically improves this experience.
Note that interactive runs act like normal runs:
# see interactive runs on the cluster
mcli util <cluster-name>
# your interactive runs will show up when you call "get runs"
mcli get runs --cluster <cluster-name>
# get more info about your run
mcli describe run <interactive-run-name>
# stop your interactive run early
mcli stop run <interactive-run-name>
# delete it
mcli delete run <interactive-run-name>
from mcli import get_run, get_cluster
# see interactive runs on the cluster
cluster = get_cluster('cluster-name'):
print("Active runs in:", cluster.name)
for run in cluster.utilization.active_runs_by_user:
print(run.run_name, run.user)
# get your interactive run
run = get_run("interactive-run-name")
print(run)
# stop your interactive run early
run.stop()
# delete it
run.delete()
Full documentation for the interactive command
Multi-node support
Multi-node interactive runs are currently only supported on reserved, single-tenant clusters
usage: mcli interactive [-h] [--hours [HOURS]] [--name NAME] [--image IMAGE]
[--max-duration MAX_DURATION] [--cluster CLUSTER]
[--gpu-type TYPE] [--gpus NGPUs] [--no-connect]
[--rank N] [--command COMMAND | --tmux]
[HOURS]
Positional Arguments#
- HOURS
Number of hours the interactive session should run
Named Arguments#
- --hours
Number of hours the interactive session should run
- --name
Name for the interactive session. Default: “”interactive””
Default: “interactive”
- --image
Docker image to use (default: “mosaicml/pytorch”)
Default: “mosaicml/pytorch”
- --max-duration
The maximum time that a run should run for (in hours). If the run exceeds this duration, it will be stopped.
Compute settings#
These settings are used to determine the cluster and compute resources to use for your interactive session
- --cluster
Cluster where your interactive session should run. If you only have one available, that one will be selected by default. Depending on your cluster, you’ll have access to different GPU types and counts. See the available combinations above.
- --gpu-type
Type of GPU to use. Valid GPU types depend on the cluster and GPU numbers requested
- --gpus
Number of GPUs to run interactively. Valid GPU numbers depend on the cluster and GPU type
Connection settings#
These settings are used for connecting to your interactive session. You can reconnect anytime using mcli connect
- --no-connect
Do not connect to the interactive session immediately
Default: True
- --rank
Connect to the specified node rank within the run
Default: 0
- --command
The command to execute in the run. By default you will be dropped into a bash shell
- --tmux
Use tmux as the entrypoint for your run so your session is robust to disconnects
Default: False
Update a run’s max duration#
After creating an interactive run, you can change its maximum duration.
mcli update run <interactive-run-name> --max-duration <hours>
Connect to a run in the terminal#
Regardless of how you launched the run, you can connect to any running run using:
mcli connect <run-name> --tmux
By default, the session will connect inside a bash shell. We highly recommend using tmux as the entrypoint for your run so your session is robust to disconnects (such as a local internet outage). You can also configure a command other than bash or tmux to execute in the run:
mcli connect --command "top"
If you are running multi-node interactive runs, you can specify the zero-indexed node rank via:
mcli connect --rank 2
Connect to a run with VSCode#
Disclaimer
Due to VSCode Server licensing, we cannot integrate directly with the native VS code remote development extensions. This guide outlines and documents how to get started with the VSCode server using tunneling
First time local setup: Install VSCode and the remote development extension pack. We recommend reviewing the system requirements and installation guide for the extension pack as some requirements are highly dependent on your operating system.
Step 1: Create an interactive run as documented above
Step 2: Connect to that run via mcli connect
Step 3: Run the following commands to download VS Code server and start it:
trap '/tmp/code tunnel unregister' EXIT
cd /tmp && curl -Lk 'https://code.visualstudio.com/sha/download?build=stable&os=cli-alpine-x64' --output vscode_cli.tar.gz
tar -xf vscode_cli.tar.gz
/tmp/code tunnel --accept-server-license-terms --no-sleep --name mml-dev-01
This will output something like:
*
* Visual Studio Code Server
*
* By using the software, you agree to
* the Visual Studio Code Server License Terms (https://aka.ms/vscode-server-license) and
* the Microsoft Privacy Statement (https://privacy.microsoft.com/en-US/privacystatement).
*
To grant access to the server, please log into https://github.com/login/device and use code ABCD-1234
Step 4: Authenticate using the code provided at https://github.com/login/device and authorize your github account
Step 5: From an existing VSCode window, connect using remote tunnel by selecting the blue remote window button on the very left of bottom sidebar. Select “Connect to tunnel” from “Remote-Tunnels” and then select the tunnel name (default: “mml-dev-01”)
Alternatively, you can connect in the browser using: https://vscode.dev/tunnel/mml-dev-01/tmp