UCObjectStore#

class composer.utils.UCObjectStore(path)[source]#

Utility class for uploading and downloading data from Databricks Unity Catalog (UC) Volumes.

Note

Using this object store requires setting DATABRICKS_HOST and DATABRICKS_TOKEN environment variables with the right credentials to be able to access the files in the unity catalog volumes.

Parameters

path (str) โ€“ The Databricks UC Volume path that is of the format Volumes/<catalog-name>/<schema-name>/<volume-name>/path/to/folder. Note that this prefix should always start with /Volumes and adhere to the above format since this object store only suports Unity Catalog Volumes and not other Databricks Filesystems.

download_object(object_name, filename, overwrite=False, callback=None)[source]#

Download the given object from UC Volumes to the specified filename.

Parameters
  • object_name (str) โ€“ The name of the object to download i.e. path relative to the root of the volume.

  • filename (str | Path) โ€“ The local path where a the file needs to be downloaded.

  • overwrite (bool, optional) โ€“ Whether to overwrite an existing file at filename, if it exists. (default: False)

  • callback ((int) -> None, optional) โ€“ Unused

Raises
get_object_size(object_name)[source]#

Get the size of the object in UC volumes in bytes.

Parameters

object_name (str) โ€“ The name of the object.

Returns

int โ€“ The object size, in bytes.

Raises
get_uri(object_name)[source]#

Returns the URI for object_name.

Note

This function does not check that object_name is in the object store. It computes the URI statically.

Parameters

object_name (str) โ€“ The object name.

Returns

str โ€“ The URI for object_name in the object store.

list_objects(prefix)[source]#

List all objects in the object store with the given prefix.

Parameters

prefix (str) โ€“ The prefix to search for.

Returns

list[str] โ€“ A list of object names that match the prefix.

upload_object(object_name, filename, callback=None)[source]#

Upload a file from local to UC volumes.

Parameters
  • object_name (str) โ€“ Name of the stored object in UC volumes w.r.t. volume root.

  • filename (str | Path) โ€“ Path the the object on disk

  • callback ((int, int) -> None, optional) โ€“ Unused

static validate_path(path)[source]#

Parses the given path to extract the UC Volume prefix from the path.

Note

This function only uses the first 4 directories from the path to construct the UC Volumes prefix and will ignore the rest of the directories in the path

Parameters
  • path (str) โ€“ The Databricks UC Volume path of the format

  • Volumes/<catalog-name>/<schema-name>/<volume-name>/path/to/folder. โ€“