composer.functional.apply_alibi(model, max_sequence_length, optimizers=None)[source]#

Removes position embeddings and replaces the attention function and attention mask as per Alibi. Note that the majority of the training speed-up from using ALiBi comes from being able to train on shorter sequence lengths; this function does not scale the training sequence length as Alibi does, so little speedup will be observed from using it alone. See the Method Card for more details. This function should be called after the model is instantiated and before training begins.


import composer.functional as cf

  • model (Module) โ€“ Model to transform.

  • max_sequence_length (int) โ€“

    Maximum sequence length that the model will be able to accept. Internally, the transformations applied by alibi change sequence-shaped tensors to handle sequences up to max_sequence_length. Depending on max_sequence_length and model these changes could increase or decrease the modelโ€™s maximum sequence length.

    At minimum, max_sequence_length should be set to the sequence length used during training. However, if evaluating on sequence lengths longer than those used in training, max_sequence_length should be set accordingly.

    Note that larger max_sequence_length means a larger memory footprint of the model. So, it is best to set this parameter equal the longest sequence length that will be seen during training and/or evaluation.

  • optimizers (Optimizer | Sequence[Optimizer], optional) โ€“

    Existing optimizers bound to model.parameters(). All optimizers that have already been constructed with model.parameters() must be specified here so they will optimize the correct parameters.

    If the optimizer(s) are constructed after calling this function, then it is safe to omit this parameter. These optimizers will see the correct model parameters.