UCObjectStore#
- class composer.utils.UCObjectStore(path)[source]#
Utility class for uploading and downloading data from Databricks Unity Catalog (UC) Volumes.
Note
Using this object store requires setting DATABRICKS_HOST and DATABRICKS_TOKEN environment variables with the right credentials to be able to access the files in the unity catalog volumes.
- Parameters
path (str) โ The Databricks UC Volume path that is of the format Volumes/<catalog-name>/<schema-name>/<volume-name>/path/to/folder. Note that this prefix should always start with /Volumes and adhere to the above format since this object store only suports Unity Catalog Volumes and not other Databricks Filesystems.
- download_object(object_name, filename, overwrite=False, callback=None)[source]#
Download the given object from UC Volumes to the specified filename.
- Parameters
object_name (str) โ The name of the object to download i.e. path relative to the root of the volume.
filename (str | Path) โ The local path where a the file needs to be downloaded.
overwrite (bool, optional) โ Whether to overwrite an existing file at
filename
, if it exists. (default:False
)callback ((int) -> None, optional) โ Unused
- Raises
FileNotFoundError โ If the file was not found in UC volumes.
ObjectStoreTransientError โ If there was any other error querying the Databricks UC volumes that should be retried.
- get_object_size(object_name)[source]#
Get the size of the object in UC volumes in bytes.
- Parameters
object_name (str) โ The name of the object.
- Returns
int โ The object size, in bytes.
- Raises
FileNotFoundError โ If the file was not found in the object store.
IsADirectoryError โ If the object is a directory, not a file.
- get_uri(object_name)[source]#
Returns the URI for
object_name
.Note
This function does not check that
object_name
is in the object store. It computes the URI statically.- Parameters
object_name (str) โ The object name.
- Returns
str โ The URI for
object_name
in the object store.
- list_objects(prefix)[source]#
List all objects in the object store with the given prefix.
- Parameters
prefix (str) โ The prefix to search for.
- Returns
list[str] โ A list of object names that match the prefix.
- upload_object(object_name, filename, callback=None)[source]#
Upload a file from local to UC volumes.
- static validate_path(path)[source]#
Parses the given path to extract the UC Volume prefix from the path.
Note
This function only uses the first 4 directories from the path to construct the UC Volumes prefix and will ignore the rest of the directories in the path
- Parameters
path (str) โ The Databricks UC Volume path of the format
Volumes/<catalog-name>/<schema-name>/<volume-name>/path/to/folder. โ