LMDatasetHparams
- class composer.datasets.LMDatasetHparams(datadir: List[str], split: str, tokenizer_name: str, num_tokens: int = 0, seed: int = 5, subsample_ratio: float = 1.0, train_sequence_length: int = 1024, val_sequence_length: int = 1024, shuffle: bool = True, drop_last: bool = False)[source]
Bases:
composer.datasets.hparams.DatasetHparams
Defines a generic dataset class for autoregressive language models.
- initialize_object() DataloaderSpec [source]
Initializes a
DataloaderSpec
for this dataset.