composer.algorithms.gated_linear_units.gated_linear_units#

composer.algorithms.gated_linear_units.gated_linear_units

Functions

apply_gated_linear_units

Replaces the Linear layers in the feed-forward network with Gated Linear Units.

from_BertIntermediate

Defines a replacement policy from a transformers.models.bert.modeling_bert.BertIntermediate to a torch.nn.Identity The identity effectively acts as no-op.

from_BertOutput

Defines a replacement policy from a transformers.models.bert.modeling_bert.BertOutput to a composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput

Classes

Algorithm

Base class for algorithms.

BERTGatedFFOutput

Defines a single feed-forward block that uses Gated Linear Units.

BertForMaskedLM

Bert Model with a language modeling head on top.

BertForSequenceClassification

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.

BertIntermediate

transformers.models.bert.modeling_bert.BertIntermediate

BertOutput

transformers.models.bert.modeling_bert.BertOutput

Event

Enum to represent training loop events.

GatedLinearUnits

Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit.

HuggingFaceModel

A wrapper class that converts ๐Ÿค— Transformers models to composer models.

Logger

An interface to record training data.

State

The state of the trainer.

Exceptions

MissingConditionalImportError

Handles errors for external packages that might not be installed.

NoEffectWarning

Warns when an algorithm did not have an effect.

Attributes

  • Callable

  • Dict

  • IS_TRANSFORMERS_INSTALLED

  • Optional

  • Sequence

  • Type

  • Union

  • annotations

  • log

class composer.algorithms.gated_linear_units.gated_linear_units.GatedLinearUnits(act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Bases: composer.core.algorithm.Algorithm

Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. The Gated Linear Units provide a more expressive form for the same number of parameters, and a slight degredation to throughput.

Runs on INIT, so it can swap the Linear layers in the FFN for GLUs before the model is DDP wrapped.

Parameters
  • act_fn (Callable[[Tensor], Tensor], optional) โ€“ Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.

  • gated_layer_bias (bool, optional) โ€“ Whether to use biases in the linear layers within the GLU. Default: False.

  • non_gated_layer_bias (bool, optional) โ€“ Whether to use biases in the linear layers within the GLU. Default: False.

Example

from composer.algorithms import GatedLinearUnits

algorithm = GatedLinearUnits()
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    max_duration="1ep",
    algorithms=[algorithm],
    optimizers=[optimizer]
)
composer.algorithms.gated_linear_units.gated_linear_units.apply_gated_linear_units(model, optimizers, act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Replaces the Linear layers in the feed-forward network with Gated Linear Units.

Parameters
  • model (torch.nn.Module) โ€“ The model to modify in-place.

  • optimizers (torch.optim.Optimizer | Sequence[torch.optim.Optimizer], optional) โ€“

    Existing optimizers bound to model.parameters(). All optimizers that have already been constructed with model.parameters() must be specified here so that they will optimize the correct parameters.

    If the optimizer(s) are constructed after calling this function, then it is safe to omit this parameter. These optimizers will see the correct model parameters.

  • act_fn (Callable[Tensor, Tensor], optional) โ€“ Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.

  • gated_layer_bias (bool, optional) โ€“ Whether to use biases in the linear layers within the GLU. Default: False.

  • non_gated_layer_bias (bool, optional) โ€“ Whether to use biases in the linear layers within the GLU. Default: False.

composer.algorithms.gated_linear_units.gated_linear_units.from_BertIntermediate(layer, module_index)[source]#

Defines a replacement policy from a transformers.models.bert.modeling_bert.BertIntermediate to a torch.nn.Identity The identity effectively acts as no-op.

composer.algorithms.gated_linear_units.gated_linear_units.from_BertOutput(layer, module_index, act_fn, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Defines a replacement policy from a transformers.models.bert.modeling_bert.BertOutput to a composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput