- composer.datasets.build_streaming_imagenet1k_dataloader(global_batch_size, remote, *, local='/tmp/mds-cache/mds-imagenet1k', split='train', drop_last=True, shuffle=True, resize_size=- 1, crop_size=224, predownload=100000, keep_zip=None, download_retry=2, download_timeout=60, validate_hash=None, shuffle_seed=None, num_canonical_nodes=None, **dataloader_kwargs)#
Builds an imagenet1k streaming dataset
global_batch_size (int) – Global batch size.
remote (str) – Remote directory (S3 or local filesystem) where dataset is stored.
local (str, optional) – Local filesystem directory where dataset is cached during operation. Defaults to
split (str) – Which split of the dataset to use. Either [‘train’, ‘val’]. Default:
drop_last (bool, optional) – whether to drop last samples. Default:
shuffle (bool, optional) – whether to shuffle dataset. Defaults to
resize_size (int, optional) – The resize size to use. Use
-1to not resize. Default:
size (crop) – The crop size to use. Default:
predownload (int, optional) – Target number of samples ahead to download the shards of while iterating. Defaults to
keep_zip (bool, optional) – Whether to keep or delete the compressed file when decompressing downloaded shards. If set to None, keep iff remote is local. Defaults to
download_retry (int) – Number of download re-attempts before giving up. Defaults to
download_timeout (float) – Number of seconds to wait for a shard to download before raising an exception. Defaults to
validate_hash (str, optional) – Optional hash or checksum algorithm to use to validate shards. Defaults to
shuffle_seed (int, optional) – Seed for shuffling, or
Nonefor random seed. Defaults to
num_canonical_nodes (int, optional) – Canonical number of nodes for shuffling with resumption. Defaults to
None, which is interpreted as the number of nodes of the initial run.
**dataloader_kwargs (Dict[str, Any]) – Additional settings for the dataloader (e.g. num_workers, etc.)