composer.utils.dist.get_sampler(dataset, *, drop_last=False, shuffle=False)[source]#

Constructs a DistributedSampler for a dataset.

The DistributedSampler assumes that each rank has a complete copy of the dataset. It ensures that each rank sees a unique shard for each epoch containing len(dataset) / get_world_size() samples.


If the dataset is already sharded by rank, use a SequentialSampler or RandomSampler.

  • dataset (Dataset) โ€“ The dataset.

  • drop_last (bool) โ€“ Whether to trop the last batch.

  • shuffle (bool) โ€“ Whether to shuffle the dataset.

Returns โ€“ The sampler.