streaming#

MosaicML Streaming Datasets for cloud-native model training.

Classes

CSVWriter

Writes a streaming CSV dataset.

JSONWriter

Writes a streaming JSON dataset.

LocalDataset

A streaming dataset whose shards reside locally as a pytorch Dataset.

MDSWriter

Writes a streaming MDS dataset.

Stream

A dataset, or sub-dataset if mixing, from which we stream/cache samples.

StreamingDataLoader

A streaming data loader.

StreamingDataset

A mid-epoch-resumable streaming/caching pytorch IterableDataset.

TSVWriter

Writes a streaming TSV dataset.

XSVWriter

Writes a streaming XSV dataset.