Module utils.



Adds ALiBi's linear attention biases as a buffer to the module.


Replaces weights with zero tensor and prevents them from being learned further.



A registry mapping for ALiBi surgery.


attrgetter(attr, ...) --> attrgetter object


  • AlibiReplacementFunction

  • Callable

  • Dict

  • Optional

  • Type

  • log

  • policy_registry

class composer.algorithms.alibi.attention_surgery_functions.utils.PolicyRegistry[source]#

Bases: Dict[Type[torch.nn.modules.module.Module], Callable[[torch.nn.modules.module.Module, int, int], Optional[torch.nn.modules.module.Module]]]

A registry mapping for ALiBi surgery.


This decorator registers mappings from torch module types to their ALiBi surgery functions.

To accommodate the specifics of composerโ€™s module surgery, our ALiBi implementation uses a registry to create a Mapping[torch.nn.Module, AlibiReplacementFunction], where AlibiReplacementFunction is any function that has a ReplacementFunction signature but with an additional max_sequence_length argument.

Implementation files (e.g., _gpt2.py) populate policy_registry (an instance of this class) by defining instances of AlibiReplacementFunction functions and decorating them with policy_registry.register() (this method). One or more Type[torch.nn.Module] source classes must be supplied as inputs to the decorator, which tells policy_registry to map those classes to the decorated function.


from composer.algorithms.alibi.attention_surgery_functions.utils import policy_registry
from transformers.models.gpt2.modeling_gpt2 import GPT2Attention

def convert_gpt2_attention(module: torch.nn.Module, index: int, max_sequence_length: int):
    # Do surgery (change ``module`` or generate a new ``module`` instance to return)
    # Note that this function should depend on ``max_sequence_length``


    return module

In the above example, convert_gpt2_attention (an instance of a AlibiReplacementFunction function) is decorated with @policy_registry.register(GPT2Attention). Using the decorator this way instructs the ALiBi algorithms to apply surgery to any instance of GPT2Attention within the model using convert_gpt2_attention (the decorated function).

Note that convert_gpt2_attention follows the specific signature of an AlibiReplacementFunction. policy_registry.register() will raise an exception if it is used to decorate a function that does not follow this signature. The requirements are: * The function takes 3 input arguments * Argument 1 has type torch.nn.Module * Argument 2 has type int * Argument 3 is named max_sequence_length and has type int

To better understand these requirements, it may be helpful to review composerโ€™s module surgery (composer/utils/module_surgery.py) and the way ALiBiโ€™s implementation uses policy_registry in composer.algorithms.alibi.apply_alibi().

composer.algorithms.alibi.attention_surgery_functions.utils.register_alibi(module, n_heads, max_token_length, causal)[source]#

Adds ALiBiโ€™s linear attention biases as a buffer to the module.

composer.algorithms.alibi.attention_surgery_functions.utils.zero_and_freeze_expand_position_embeddings(module, max_sequence_length, position_embedding_attribute)[source]#

Replaces weights with zero tensor and prevents them from being learned further.

This is intended to be used specifically for โ€œremovingโ€ positional embeddings.