GatedLinearUnits#
- class composer.algorithms.GatedLinearUnits(act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#
Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. The Gated Linear Units provide a more expressive form for the same number of parameters, and a slight degradation to throughput.
Runs on
Event.INIT
, so it can swap the Linear layers in the FFN for GLUs before the model is DDP wrapped.- Parameters
act_fn (Callable[[Tensor], Tensor], optional) โ Optionally, the activation function to use. If
None
, the algorithm will use the existing activation function in the model.gated_layer_bias (bool, optional) โ Whether to use biases in the linear layers within the GLU. Default:
False
.non_gated_layer_bias (bool, optional) โ Whether to use biases in the linear layers within the GLU. Default:
False
.
Example
from composer.algorithms import GatedLinearUnits algorithm = GatedLinearUnits() trainer = Trainer( model=model, train_dataloader=train_dataloader, max_duration="1ep", algorithms=[algorithm], optimizers=[optimizer] )