Alibi#

class composer.algorithms.Alibi(max_sequence_length, train_sequence_length_scaling=0.25)[source]#

ALiBi (Attention with Linear Biases; Press et al, 2021) dispenses with position embeddings and instead directly biases attention matrices such that nearby tokens attend to one another more strongly.

ALiBi yields excellent extrapolation to unseen sequence lengths compared to other position embedding schemes. We leverage this extrapolation capability by training with shorter sequence lengths, which reduces the memory and computation load.

This algorithm runs on Event.INIT to modify the model before the model has been moved to accelerators. It also runs on Event.AFTER_DATALOADER to modify the shape of a batch of data after the model and data have been moved to accelerators.

See the Method Card for more details.

Example:

from composer.algorithms import Alibi
from composer.trainer import Trainer

alibi = Alibi(
    max_sequence_length=512,
    train_sequence_length_scaling=0.25,
)

trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    max_duration="1ep",
    algorithms=[alibi]
)
Parameters
  • max_sequence_length (int) โ€“ Maximum sequence length that the model will be able to accept. This is sometimes necessary for evaluating on sequence lengths longer than the model was initialized to accommodate.

  • train_sequence_length_scaling (float, optional) โ€“ Amount by which to scale training sequence length. One batch of training data will be reshaped from shape \((sequence\_length, batch)\) to \((sequence\_length \times train\_sequence\_length\_scaling, \frac{batch}{train\_sequence\_length\_scaling})\). Default: 0.25.