apply_alibi#
- composer.functional.apply_alibi(model, max_sequence_length, optimizers=None)[source]#
- Removes position embeddings and replaces the attention function and attention mask as per - Alibi. Note that the majority of the training speed-up from using ALiBi comes from being able to train on shorter sequence lengths; this function does not scale the training sequence length as- Alibidoes, so little speedup will be observed from using it alone. See the Method Card for more details. This function should be called after the model is instantiated and before training begins.- Example: - import composer.functional as cf cf.apply_alibi( model=model, max_sequence_length=512 ) - Parameters
- model (Module) โ Model to transform. 
- max_sequence_length (int) โ - Maximum sequence length that the model will be able to accept. Internally, the transformations applied by alibi change sequence-shaped tensors to handle sequences up to - max_sequence_length. Depending on- max_sequence_lengthand- modelthese changes could increase or decrease the modelโs maximum sequence length.- At minimum, - max_sequence_lengthshould be set to the sequence length used during training. However, if evaluating on sequence lengths longer than those used in training,- max_sequence_lengthshould be set accordingly.- Note that larger - max_sequence_lengthmeans a larger memory footprint of the model. So, it is best to set this parameter equal the longest sequence length that will be seen during training and/or evaluation.
- optimizers (Optimizer | Sequence[Optimizer], optional) โ - Existing optimizers bound to - model.parameters(). All optimizers that have already been constructed with- model.parameters()must be specified here so they will optimize the correct parameters.- If the optimizer(s) are constructed after calling this function, then it is safe to omit this parameter. These optimizers will see the correct model parameters.