LMDatasetHparams

class composer.datasets.LMDatasetHparams(datadir: List[str], split: str, tokenizer_name: str, num_tokens: int = 0, seed: int = 5, subsample_ratio: float = 1.0, train_sequence_length: int = 1024, val_sequence_length: int = 1024, shuffle: bool = True, drop_last: bool = False)[source]

Bases: composer.datasets.hparams.DatasetHparams

Defines a generic dataset class for autoregressive language models.

initialize_object() DataloaderSpec[source]

Initializes a DataloaderSpec for this dataset.