composer.datasets.mnist#
MNIST image classification dataset.
The MNIST dataset is a collection of labeled 28x28 black and white images of handwritten examples of the numbers 0-9. See the wikipedia entry for more details.
Hparams
These classes are used with yahp for YAML-based configuration.
Defines an instance of the MNIST dataset for image classification. |
|
Defines an instance of the MNIST WebDataset for image classification. |
- class composer.datasets.mnist.MNISTDatasetHparams(use_synthetic=False, synthetic_num_unique_samples=100, synthetic_device='cpu', synthetic_memory_format=MemoryFormat.CONTIGUOUS_FORMAT, is_train=True, drop_last=True, shuffle=True, datadir=None, download=True)[source]#
Bases:
composer.datasets.hparams.DatasetHparams,composer.datasets.hparams.SyntheticHparamsMixinDefines an instance of the MNIST dataset for image classification.
- Parameters
use_synthetic (bool, optional) โ Whether to use synthetic data. Default:
False.synthetic_num_unique_samples (int, optional) โ The number of unique samples to allocate memory for. Ignored if
use_syntheticisFalse. Default:100.synthetic_device (str, optional) โ The device to store the sample pool on. Set to
'cuda'to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to'cpu'to move data between host memory and the device on every batch. Ignored ifuse_syntheticisFalse. Default:'cpu'.synthetic_memory_format โ The
MemoryFormatto use. Ignored ifuse_syntheticisFalse. Default:'CONTIGUOUS_FORMAT'.datadir (str) โ The path to the data directory.
is_train (bool) โ Whether to load the training data or validation data. Default:
True.drop_last (bool) โ If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default:
True.shuffle (bool) โ Whether to shuffle the dataset. Default:
True.download (bool, optional) โ Whether to download the dataset, if needed. Default:
True.
- initialize_object(batch_size, dataloader_hparams)[source]#
Creates a
DataLoaderorDataSpecfor this dataset.- Parameters
batch_size (int) โ The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.
dataloader_hparams (DataLoaderHparams) โ The dataset-independent hparams for the dataloader.
- Returns
DataLoader or DataSpec โ The
DataLoader, or if the dataloader yields batches of custom types, aDataSpec.
- class composer.datasets.mnist.MNISTWebDatasetHparams(is_train=True, drop_last=True, shuffle=True, datadir=None, webdataset_cache_dir='/tmp/webdataset_cache/', webdataset_cache_verbose=False, shuffle_buffer=256, remote='s3://mosaicml-internal-dataset-mnist', name='mnist')[source]#
Bases:
composer.datasets.hparams.WebDatasetHparamsDefines an instance of the MNIST WebDataset for image classification.
- Parameters
datadir (str) โ The path to the data directory.
is_train (bool) โ Whether to load the training data or validation data. Default:
True.drop_last (bool) โ If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default:
True.shuffle (bool) โ Whether to shuffle the dataset. Default:
True.datadir โ The path to the data directory.
is_train โ Whether to load the training data or validation data. Default:
True.drop_last โ If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default:
True.shuffle โ Whether to shuffle the dataset. Default:
True.webdataset_cache_dir (str) โ WebDataset cache directory.
webdataset_cache_verbose (str) โ WebDataset cache verbosity.
remote (str) โ S3 bucket or root directory where dataset is stored.
name (str) โ Key used to determine where dataset is cached on local filesystem.
- initialize_object(batch_size, dataloader_hparams)[source]#
Creates a
DataLoaderorDataSpecfor this dataset.- Parameters
batch_size (int) โ The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.
dataloader_hparams (DataLoaderHparams) โ The dataset-independent hparams for the dataloader.
- Returns
DataLoader or DataSpec โ The
DataLoader, or if the dataloader yields batches of custom types, aDataSpec.