composer.algorithms#

Efficiency methods for training.

Examples include LabelSmoothing and adding SqueezeExcite blocks, among many others.

Algorithms are implemented in both a standalone functional form (see composer.functional) and as subclasses of Algorithm for integration in the Composer Trainer. The former are easier to integrate piecemeal into an existing codebase. The latter are easier to compose together, since they all have the same public interface and work automatically with the Composer Trainer.

For ease of composability, algorithms in our Trainer are based on the two-way callbacks concept from Howard et al, 2020. Each algorithm implements two methods:

  • Algorithm.match(): returns True if the algorithm should be run given the current State and Event.

  • Algorithm.apply(): performs an in-place modification of the given State

For example, a simple algorithm that shortens training:

from composer import Algorithm, State, Event, Logger

class ShortenTraining(Algorithm):

    def match(self, state: State, event: Event, logger: Logger) -> bool:
        return event == Event.INIT

    def apply(self, state: State, event: Event, logger: Logger):
        state.max_duration /= 2  # cut training time in half

For more information about events, see Event.

Classes

Alibi

ALiBi (Attention with Linear Biases; Press et al, 2021) dispenses with position embeddings and instead directly biases attention matrices such that nearby tokens attend to one another more strongly.

AugMix

The AugMix data augmentation technique.

AugmentAndMixTransform

Wrapper module for augmix_image() that can be passed to torchvision.transforms.Compose.

BlurPool

BlurPool adds anti-aliasing filters to convolutional layers.

ChannelsLast

Changes the memory format of the model to torch.channels_last.

ColOut

Drops a fraction of the rows and columns of an input image and (optionally) a target image.

ColOutTransform

Torchvision-like transform for performing the ColOut augmentation, where random rows and columns are dropped from up to two Torch tensors or two PIL images.

CutMix

CutMix trains the network on non-overlapping combinations of pairs of examples and interpolated targets rather than individual examples and targets.

CutOut

CutOut is a data augmentation technique that works by masking out one or more square regions of an input image.

EMA

Maintains a shadow model with weights that follow the exponential moving average of the trained model weights.

Factorize

Decomposes linear operators into pairs of smaller linear operators.

FusedLayerNorm

Replaces all instances of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm.FusedLayerNorm.

GatedLinearUnits

Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit.

GhostBatchNorm

Replaces batch normalization modules with Ghost Batch Normalization modules that simulate the effect of using a smaller batch size.

GradientClipping

Clips all gradients in model based on specified clipping_type.

GyroDropout

Replaces all instances of torch.nn.Dropout with a GyroDropout.

LabelSmoothing

Shrink targets towards a uniform distribution as in Szegedy et al.

LayerFreezing

Progressively freeze the layers of the network during training, starting with the earlier layers.

LowPrecisionLayerNorm

Replaces all instances of torch.nn.LayerNorm with composer.algorithms.low_precision_layernorm.low_precision_layernorm.LPLayerNorm.

MixUp

MixUp trains the network on convex batch combinations.

NoOpModel

Runs on Event.INIT and replaces the model with a dummy NoOpModelClass instance.

ProgressiveResizing

Resize inputs and optionally outputs by cropping or interpolating.

RandAugment

Randomly applies a sequence of image data augmentations to an image.

RandAugmentTransform

Wraps randaugment_image() in a torchvision-compatible transform.

SAM

Adds sharpness-aware minimization (Foret et al, 2020) by wrapping an existing optimizer with a SAMOptimizer.

SWA

Applies Stochastic Weight Averaging (Izmailov et al, 2018).

SelectiveBackprop

Selectively backpropagate gradients from a subset of each batch.

SeqLengthWarmup

Progressively increases the sequence length during training.

SqueezeExcite

Adds Squeeze-and-Excitation blocks (Hu et al, 2019) after the torch.nn.Conv2d modules in a neural network.

SqueezeExcite2d

Squeeze-and-Excitation block from (Hu et al, 2019)

SqueezeExciteConv2d

Helper class used to add a SqueezeExcite2d module after a torch.nn.Conv2d.

StochasticDepth

Applies Stochastic Depth (Huang et al, 2016) to the specified model.

WeightStandardization

Weight Standardization standardizes convolutional weights in a model.