apply_gated_linear_units#

composer.functional.apply_gated_linear_units(model, optimizers, act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Replaces the Linear layers in the feed-forward network with Gated Linear Units.

Parameters

model (torch.nn.Module) – The model to modify in-place.
optimizers (torch.optim.Optimizer | Sequence[torch.optim.Optimizer], optional) –
Existing optimizers bound to model.parameters(). All optimizers that have already been constructed with model.parameters() must be specified here so that they will optimize the correct parameters.

If the optimizer(s) are constructed after calling this function, then it is safe to omit this parameter. These optimizers will see the correct model parameters.
act_fn (Callable[Tensor, Tensor], optional) – Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.
gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.
non_gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.