GPT2Model
- class composer.models.GPT2Model(module: transformers.GPT2Model, config: transformers.GPT2Config, tokenizer_name: str)[source]
Bases:
composer.models.transformer_shared.MosaicTransformer
Implements a GPT-2 wrapper around a MosaicTransformer.
See this paper for details on the GPT-2 architecutre.
- Parameters
module (transformers.GPT2Model) – The model to wrap with this module.
config (transformers.GPT2Config) – The config for the model.
tokenizer_name (str) – The name of the tokenizer used for tihs model, necessary to assert required model inputs.
- loss(outputs: Mapping, batch: composer.core.types.Batch) composer.core.types.Tensors [source]
Computes the loss of the tensor from the output.
We don’t implement this for the generic Transformer abstraction, since loss functions are model and objective specific. A single model architecture could use a myriad of loss functions which are better left expressed by the user.
- Parameters
outputs (Mapping) – The dictionary output from the model. It could contain the loss as computed by Hugging Face, or algorithms can pop the labels from the input in case they modify the loss function.
batch (Batch) – The set of ground truth labels to use to compute the loss against.
- Returns
The loss as a ``Tensors`` object.
- Raises
NotImplementedError – A model-specific and task-specific loss function must be written.
- metrics(train: bool = False) Metrics [source]
Get metrics for evaluating the model.
Downstream models should override this method if they would like to add task-specific metrics.
- Parameters
train (bool) – a boolean flag to indicate whether to return training or validation metrics.
Warning
If train=True, then it might calculate the training loss twice if algorithms are overriding the loss fn. This could be expensive due to the computational cost of softmax; it is worth exploring caching strategies.
- Returns
A Metrics object that can be used to calculate task performance.