HFUploader#

class streaming.base.storage.HFUploader(out, keep_local=False, progress_bar=False, retry=2, exist_ok=False)[source]#

Upload file from local machine to a Huggingface Dataset.

Parameters
  • out (str) –

    Output dataset directory to save shard files.

    1. If out is a local directory, shard files are saved locally.

    2. If out is a remote directory then the shard files are uploaded to the remote location.

  • keep_local (bool) – If the dataset is uploaded, whether to keep the local dataset shard file or remove it after uploading. Defaults to False.

  • progress_bar (bool) – Display TQDM progress bars for uploading output dataset files to a remote location. Default to False.

  • retry (int) – Number of times to retry uploading a file. Defaults to 2.

  • exist_ok (bool) – When exist_ok = False, raise error if the local part of out already exists and has contents. Defaults to False.

check_dataset_exists()[source]#

Raise an exception if the dataset does not exist.

Raises

error – Dataset does not exist.

upload_file(filename)[source]#

Upload file from local instance to HF.

Parameters

filename (str) – File to upload.