# composer.algorithms.gated_linear_units.gated_linear_units#

composer.algorithms.gated_linear_units.gated_linear_units

Functions

 apply_gated_linear_units Replaces the Linear layers in the feed-forward network with Gated Linear Units. from_BertIntermediate Defines a replacement policy from a transformers.models.bert.modeling_bert.BertIntermediate to a torch.nn.Identity The identity effectively acts as no-op. from_BertOutput Defines a replacement policy from a transformers.models.bert.modeling_bert.BertOutput to a composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput

Classes

 Algorithm Base class for algorithms. BERTGatedFFOutput Defines a single feed-forward block that uses Gated Linear Units. BERTModel BERT model based on 🤗 Transformers. BertIntermediate transformers.models.bert.modeling_bert.BertIntermediate BertOutput transformers.models.bert.modeling_bert.BertOutput Event Enum to represent training loop events. GatedLinearUnits Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. Logger An interface to record training data. State The state of the trainer.

Exceptions

 MissingConditionalImportError Handles errors for external packages that might not be installed. NoEffectWarning Warns when an algorithm did not have an effect.

Attributes

• Callable

• Dict

• IS_TRANSFORMERS_INSTALLED

• Optional

• Sequence

• Type

• Union

• annotations

• log

class composer.algorithms.gated_linear_units.gated_linear_units.GatedLinearUnits(act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. The Gated Linear Units provide a more expressive form for the same number of parameters, and a slight degredation to throughput.

Runs on INIT, so it can swap the Linear layers in the FFN for GLUs before the model is DDP wrapped.

Parameters
• act_fn (Callable[[Tensor], Tensor], optional) – Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.

• gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.

• non_gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.

Example

from composer.algorithms import GatedLinearUnits

algorithm = GatedLinearUnits()
trainer = Trainer(
model=model,
max_duration="1ep",
algorithms=[algorithm],
optimizers=[optimizer]
)

composer.algorithms.gated_linear_units.gated_linear_units.apply_gated_linear_units(model, optimizers, act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Replaces the Linear layers in the feed-forward network with Gated Linear Units.

Parameters
• model (torch.nn.Module) – The model to modify in-place.

• optimizers (torch.optim.Optimizer | Sequence[torch.optim.Optimizer], optional) –

Existing optimizers bound to model.parameters(). All optimizers that have already been constructed with model.parameters() must be specified here so that they will optimize the correct parameters.

If the optimizer(s) are constructed after calling this function, then it is safe to omit this parameter. These optimizers will see the correct model parameters.

• act_fn (Callable[Tensor, Tensor], optional) – Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.

• gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.

• non_gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.

composer.algorithms.gated_linear_units.gated_linear_units.from_BertIntermediate(layer, module_index)[source]#

Defines a replacement policy from a transformers.models.bert.modeling_bert.BertIntermediate to a torch.nn.Identity The identity effectively acts as no-op.

composer.algorithms.gated_linear_units.gated_linear_units.from_BertOutput(layer, module_index, act_fn, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Defines a replacement policy from a transformers.models.bert.modeling_bert.BertOutput to a composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput