composer.models.gpt2.model#

GPT-2 model based on Hugging Face GPT-2.

Implemented as a wrapper using ComposerTrainer.

Functions

create_gpt2

Implements HuggingFaceModel to wrap Hugging Face GPT-2 transformers. Logs training and

composer.models.gpt2.model.create_gpt2(use_pretrained=False, pretrained_model_name=None, model_config=None, tokenizer_name=None, gradient_checkpointing=False)[source]#
Implements HuggingFaceModel to wrap Hugging Face GPT-2 transformers. Logs training and

validation perplexity.

From Language Models are Unsupervised Multitask Learners (Radford et al, 2018).

Args:

gradient_checkpointing (bool, optional): Use gradient checkpointing. Default: False. use_pretrained (bool, optional): Whether to initialize the model with the pretrained weights. Default: False. model_config (dict): A dictionary providing a HuggingFace model configuration. tokenizer_name (str, optional): Tokenizer name used to preprocess the dataset and validate the models inputs.

{
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": ["GPT2LMHeadModel"],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
  "text-generation": {
  "do_sample": true,
  "max_length": 50 }
  },
  "transformers_version": "4.16.0",
  "use_cache": true,
  "vocab_size": 50257
}

To create a GPT-2 model for language modeling pretraining:

from composer.models import create_gpt2

composer_model = create_gpt2()