CloudUploader#
- class streaming.base.storage.CloudUploader(out, keep_local=False, progress_bar=False, retry=2, exist_ok=False)[source]#
Upload local files to a cloud storage.
- clear_local(local)[source]#
Remove the local file if it is enabled.
- Parameters
local (str) – A local file path.
- classmethod get(out, keep_local=False, progress_bar=False, retry=2, exist_ok=False)[source]#
Instantiate a cloud provider uploader or a local uploader based on remote path.
- Parameters
out (str | Tuple[str, str]) –
Output dataset directory to save shard files.
If
out
is a local directory, shard files are saved locally.If
out
is a remote directory, a local temporary directory is created to cache the shard files and then the shard files are uploaded to a remote location. At the end, the temp directory is deleted once shards are uploaded.If
out
is a tuple of(local_dir, remote_dir)
, shard files are saved in the local_dir and also uploaded to a remote location.
keep_local (bool) – If the dataset is uploaded, whether to keep the local dataset shard file or remove it after uploading. Defaults to
False
.progress_bar (bool) – Display TQDM progress bars for uploading output dataset files to a remote location. Default to
False
.retry (int) – Number of times to retry uploading a file. Defaults to
2
.exist_ok (bool) – When exist_ok = False, raise error if the local part of
out
already exists and has contents. Defaults toFalse
.
- Returns
CloudUploader – An instance of sub-class.
- list_objects(prefix=None)[source]#
List all objects in the object store with the given prefix.
- Parameters
prefix (Optional[str], optional) – The prefix to search for. Defaults to
None
.- Returns
List[str] – A list of object names that match the prefix.
- upload_file(filename)[source]#
Upload file from local instance to remote instance.
- Parameters
filename (str) – File to upload.
- Raises
NotImplementedError – Override this method in your sub-class.