GCP Storage#

In order to stream data from GCP storage buckets when training models, MCLI will need access to your GCP credentials.

There are two ways to add accessible GCP credentials to MCLI.

GCP Service Account Credentials#

The first way is to create a service account key to your associated GCP bucket and provide the associated JSON credentials to MCLI. For instructions, see the link here. This will allow you to immediately access your bucket via the google-cloud-storage client within your code. To add GCP credentials to MCLI this way, use the following command:

mcli create secret gcp

which produces the following output:

> mcli create secret gcp
? What would you like to name this secret? my-gcp-credentials
? Where is your gcp credentials file located? <my_gcp_credentials.json>
✔  Created secret: my-gcp-credentials

The values for each of these queries can be passed as arguments using the –name and –credentials-file arguments, respectively. Your credentials file should follow the standard structure output as referenced by the google cloud documentation:

  "type": "service_account",
  "project_id": "<PROJECT_ID>",
  "private_key_id": "<KEY_ID>",
  "private_key": "-----BEGIN PRIVATE KEY-----\n<PRIVATE_KEY>\n-----END PRIVATE KEY-----\n",
  "client_email": "<SERVICE_ACCOUNT_EMAIL>",
  "client_id": "<CLIENT_ID>",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<SERVICE_ACCOUNT_EMAIL>"

Once you’ve created a GCP secret, the credentials file will be mounted inside all of your runs. We set the environment variable GOOGLE_APPLICATION_CREDENTIALS automatically, so that libraries like google-cloud-storage can automatically detect the credentials:

from google.cloud import storage

storage_client = storage.Client()
buckets = list(storage_client.list_buckets())
print("my buckets: ", buckets)

GCP User Auth Credentials Mounted as Environment Variables#

The second way to add your GCP user credentials or HMAC key is to set your GCP user access key and GCP user access secret as environment variables for your runs. You can set these environment variables as such.

mcli create secret env GCS_KEY=<GCS_KEY value>
mcli create secret env GCS_SECRET=<GCS_SECRET value>

This will add two environment variables MY_GCS_KEY and MY_GCS_SECRET to your runs. You can then access your bucket using a libcloudObjectStore object in your code with the following initialization:

from libcloud.storage.drivers.google_storage import GoogleStorageDriver
import os

driver = GoogleStorageDriver(key=os.environ['GCS_KEY'], secret=os.environ['GCS_SECRET'],...)