GatedLinearUnits#

class composer.algorithms.GatedLinearUnits(act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. The Gated Linear Units provide a more expressive form for the same number of parameters, and a slight degradation to throughput.

Runs on Event.INIT, so it can swap the Linear layers in the FFN for GLUs before the model is DDP wrapped.

Parameters

act_fn (Callable[[Tensor], Tensor], optional) – Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.
gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.
non_gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.

Example

from composer.algorithms import GatedLinearUnits

algorithm = GatedLinearUnits()
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    max_duration="1ep",
    algorithms=[algorithm],
    optimizers=[optimizer]
)