SyntheticBatchPairDataset#
- class composer.datasets.SyntheticBatchPairDataset(*, total_dataset_size, data_shape, num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, device='cpu', memory_format=MemoryFormat.CONTIGUOUS_FORMAT, transform=None)[source]#
Emulates a dataset of provided size and shape.
- Parameters
total_dataset_size (int) โ The total size of the dataset to emulate.
data_shape (List[int]) โ Shape of the tensor for input samples.
num_unique_samples_to_create (int) โ The number of unique samples to allocate memory for.
data_type (str or SyntheticDataType, optional) โ Default:
SyntheticDataType.GAUSSIAN
.label_type (str or SyntheticDataLabelType, optional) โ create. Default:
SyntheticDataLabelType.CLASSIFICATION_INT
.num_classes (int, optional) โ Number of classes to use. Required if
SyntheticDataLabelType
isCLASSIFICATION_INT
or``CLASSIFICATION_ONE_HOT``. Default:None
.label_shape (List[int], optional) โ Shape of the tensor for each sample label. Default:
None
.device (str) โ Device to store the sample pool. Set to
'cuda'
to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to'cpu'
to move data between host memory and the gpu on every batch. Default:'cpu'
.memory_format (
composer.core.MemoryFormat
, optional) โ Memory format for the sample pool. Default: MemoryFormat.CONTIGUOUS_FORMAT.transform (Callable, optional) โ Transform(s) to apply to data. Default:
None
.