Cloudflare R2#

Cloudflare r2 is an s3-compatabile storage system. Developers can perform many CRUD operations on r2 with AWS CLIs and SDKs. In practice, a Cloudflare integration feels very much like an s3 integration.

Retrieve your R2 Secret Access Key and Access Key ID key-pair. If you do not have one then follow the instructions here to create a key-pair.

Store the information above in a credentials file (ie ~/.r2/credentials):

[default]
aws_access_key_id=<your_cloudflare_access_key_id>
aws_secret_access_key=<your_cloudflare_access_secret_key>

Create an empty config file (ie ~/.r2/config) as:

[default]

Find your Cloudflare accountID using these instructions. Use the account ID to set a Databricks Mosiac AI plaform environment variable secret:

mcli create secret env S3_ENDPOINT_URL='https://{ACCOUNT_ID}.r2.cloudflarestorage.com' 

Now we can treat these credentials as if they are for aws s3. Run the following command:

> mcli create secret s3
? What would you like to name this secret? my-r2-credentials
? Where is your S3 config file located? ~/.r2/config
? Where is your S3 credentials file located? ~/.r2/credentials
✔  Created secret: my-r2-credentials

The values for each of these queries can also be passed as arguments using the --name, --config-file and --credentials-file arguments, respectively.

Once you’ve created an S3 secret, we mount these secrets inside all of your runs and export two environment variables:

  • $AWS_CONFIG_FILE: Path to your config file

  • $AWS_SHARED_CREDENTIALS_FILE: Path to your credentials file

A library like boto3 uses these environment variables by default to discover your s3 credentials:

import boto3
import os

# boto3 automatically pulls from $AWS_CONFIG_FILE and $AWS_SHARED_CREDENTIALS_FILE
s3 = boto3.client('s3', endpoint_url=os.environ['S3_ENDPOINT_URL'])

🙌