build_streaming_imagenet1k_dataloader#
- composer.datasets.build_streaming_imagenet1k_dataloader(global_batch_size, remote, *, local='/tmp/mds-cache/mds-imagenet1k', split='train', drop_last=True, shuffle=True, resize_size=- 1, crop_size=224, predownload=100000, keep_zip=None, download_retry=2, download_timeout=60, validate_hash=None, shuffle_seed=None, num_canonical_nodes=None, **dataloader_kwargs)[source]#
Builds an imagenet1k streaming dataset
- Parameters
global_batch_size (int) โ Global batch size.
remote (str) โ Remote directory (S3 or local filesystem) where dataset is stored.
local (str, optional) โ Local filesystem directory where dataset is cached during operation. Defaults to
'/tmp/mds-cache/mds-imagenet1k/`
.split (str) โ Which split of the dataset to use. Either [โtrainโ, โvalโ]. Default:
'train`
.drop_last (bool, optional) โ whether to drop last samples. Default:
True
.shuffle (bool, optional) โ whether to shuffle dataset. Defaults to
True
.resize_size (int, optional) โ The resize size to use. Use
-1
to not resize. Default:-1
.size (crop) โ The crop size to use. Default:
224
.predownload (int, optional) โ Target number of samples ahead to download the shards of while iterating. Defaults to
100_000
.keep_zip (bool, optional) โ Whether to keep or delete the compressed file when decompressing downloaded shards. If set to None, keep iff remote is local. Defaults to
None
.download_retry (int) โ Number of download re-attempts before giving up. Defaults to
2
.download_timeout (float) โ Number of seconds to wait for a shard to download before raising an exception. Defaults to
60
.validate_hash (str, optional) โ Optional hash or checksum algorithm to use to validate shards. Defaults to
None
.shuffle_seed (int, optional) โ Seed for shuffling, or
None
for random seed. Defaults toNone
.num_canonical_nodes (int, optional) โ Canonical number of nodes for shuffling with resumption. Defaults to
None
, which is interpreted as the number of nodes of the initial run.**dataloader_kwargs (Dict[str, Any]) โ Additional settings for the dataloader (e.g. num_workers, etc.)