# composer.algorithms#

Modules

 composer.algorithms.agc Adaptive gradient Clipping Clips all gradients in model based on ratio of gradient norms to parameter norms. composer.algorithms.algorithm_hparams composer.algorithms.algorithm_hparams composer.algorithms.algorithm_registry composer.algorithms.algorithm_registry composer.algorithms.alibi ALiBi (Attention with Linear Biases; Press et al, 2021) dispenses with position embeddings for tokens in transformer-based NLP models, instead encoding position information by biasing the query-key attention scores proportionally to each token pair's distance. composer.algorithms.augmix AugMix (Hendrycks et al, 2020) creates multiple independent realizations of sequences of image augmentations, applies each sequence with random intensity, and returns a convex combination of the augmented images and the original image. composer.algorithms.blurpool BlurPool adds anti-aliasing filters to convolutional layers to increase accuracy and invariance to small shifts in the input. composer.algorithms.channels_last Changes the memory format of the model to torch.channels_last. composer.algorithms.colout Drops a fraction of the rows and columns of an input image. composer.algorithms.cutmix CutMix trains the network on non-overlapping combinations of pairs of examples and iterpolated targets rather than individual examples and targets. composer.algorithms.cutout Cutout is a data augmentation technique that works by masking out one or more square regions of an input image. composer.algorithms.ema Exponential moving average maintains a moving average of model parameters and uses these at test time. composer.algorithms.factorize Decomposes linear operators into pairs of smaller linear operators. composer.algorithms.ghost_batchnorm Replaces batch normalization modules with Ghost Batch Normalization modules that simulate the effect of using a smaller batch size. composer.algorithms.hparams composer.algorithms.hparams composer.algorithms.label_smoothing Shrinks targets towards a uniform distribution to counteract label noise. composer.algorithms.layer_freezing Progressively freeze the layers of the network during training, starting with the earlier layers. composer.algorithms.mixup Create new samples using convex combinations of pairs of samples. composer.algorithms.no_op_model Replaces model with a dummy model of type NoOpModelClass. composer.algorithms.progressive_resizing Apply Fastai's progressive resizing data augmentation to speed up training. composer.algorithms.randaugment Randomly applies a sequence of image data augmentations (Cubuk et al, 2019) to an image. composer.algorithms.sam SAM (Foret et al, 2020) wraps an existing optimizer with a SAMOptimizer which makes the optimizer minimize both loss value and sharpness.This can improves model generalization and provide robustness to label noise. composer.algorithms.selective_backprop Selective Backprop prunes minibatches according to the difficulty of the individual training examples, and only computes weight gradients over the pruned subset, reducing iteration time and speeding up training. composer.algorithms.seq_length_warmup Sequence length warmup progressively increases the sequence length during training of NLP models. composer.algorithms.squeeze_excite Adds Squeeze-and-Excitation blocks (Hu et al, 2019) after the Conv2d modules in a neural network. composer.algorithms.stochastic_depth Implements stochastic depth (Huang et al, 2016) for ResNet blocks. composer.algorithms.swa Stochastic Weight Averaging (SWA; Izmailov et al, 2018) averages model weights sampled at different times near the end of training. composer.algorithms.utils Helper utilities for algorithms. composer.algorithms.warnings composer.algorithms.warnings

Efficiency methods for training.

Examples include LabelSmoothing and adding SqueezeExcite blocks, among many others.

Algorithms are implemented in both a standalone functional form (see composer.functional) and as subclasses of Algorithm for integration in the Composer Trainer. The former are easier to integrate piecemeal into an existing codebase. The latter are easier to compose together, since they all have the same public interface and work automatically with the Composer Trainer.

For ease of composability, algorithms in our Trainer are based on the two-way callbacks concept from Howard et al, 2020. Each algorithm implements two methods:

• Algorithm.match(): returns True if the algorithm should be run given the current State and Event.

• Algorithm.apply(): performs an in-place modification of the given State

For example, a simple algorithm that shortens training:

from composer import Algorithm, State, Event, Logger

class ShortenTraining(Algorithm):

def match(self, state: State, event: Event, logger: Logger) -> bool:
return event == Event.INIT

def apply(self, state: State, event: Event, logger: Logger):
state.max_duration /= 2  # cut training time in half


For more information about events, see Event.

Functions

 get_algorithm_registry composer.algorithms.algorithm_registry.get_algorithm_registry list_algorithms composer.algorithms.algorithm_registry.list_algorithms

Classes

 AGC Clips all gradients in model based on ratio of gradient norms to parameter norms. Alibi ALiBi (Attention with Linear Biases; Press et al, 2021) dispenses with position embeddings and instead directly biases attention matrices such that nearby tokens attend to one another more strongly. AugMix AugMix (Hendrycks et al, 2020) creates width sequences of depth image augmentations, applies each sequence with random intensity, and returns a convex combination of the width augmented images and the original image. AugmentAndMixTransform Wrapper module for augmix_image() that can be passed to torchvision.transforms.Compose. BlurPool BlurPool adds anti-aliasing filters to convolutional layers to increase accuracy and invariance to small shifts in the input. ChannelsLast Changes the memory format of the model to torch.channels_last. ColOut Drops a fraction of the rows and columns of an input image and (optionally) a target image. ColOutTransform Torchvision-like transform for performing the ColOut augmentation, where random rows and columns are dropped from up to two Torch tensors or two PIL images. CutMix CutMix trains the network on non-overlapping combinations of pairs of examples and interpolated targets rather than individual examples and targets. CutOut CutOut is a data augmentation technique that works by masking out one or more square regions of an input image. EMA Maintains a shadow model with weights that follow the exponential moving average of the trained model weights. Factorize Decomposes linear operators into pairs of smaller linear operators. GhostBatchNorm Replaces batch normalization modules with Ghost Batch Normalization modules that simulate the effect of using a smaller batch size. LabelSmoothing Shrink targets towards a uniform distribution as in Szegedy et al. LayerFreezing Progressively freeze the layers of the network during training, starting with the earlier layers. MixUp MixUp trains the network on convex combinations of pairs of examples and targets rather than individual examples and targets. NoOpModel Runs on Event.INIT and replaces the model with a dummy model of type NoOpModelClass. ProgressiveResizing Apply Fastai's progressive resizing data augmentation to speed up training. RandAugment Randomly applies a sequence of image data augmentations (Cubuk et al, 2019) to an image. RandAugmentTransform Wraps randaugment_image() in a torchvision-compatible transform. SAM Adds sharpness-aware minimization (Foret et al, 2020) by wrapping an existing optimizer with a SAMOptimizer. SWA Apply Stochastic Weight Averaging (Izmailov et al, 2018) SelectiveBackprop Selectively backpropagate gradients from a subset of each batch. SeqLengthWarmup Progressively increases the sequence length during training. SqueezeExcite Adds Squeeze-and-Excitation blocks (Hu et al, 2019) after the Conv2d modules in a neural network. SqueezeExcite2d Squeeze-and-Excitation block from (Hu et al, 2019) SqueezeExciteConv2d Helper class used to add a SqueezeExcite2d module after a Conv2d. StochasticDepth Applies Stochastic Depth (Huang et al, 2016) to the specified model.

Hparams

These classes are used with yahp for YAML-based configuration.

 AGCHparams See AGC AlgorithmHparams Hyperparameters for algorithms. AlibiHparams See Alibi AugMixHparams See AugMix BlurPoolHparams See BlurPool ChannelsLastHparams ChannelsLast has no hyperparameters, so this class has no member variables. ColOutHparams See ColOut CutMixHparams See CutMix CutOutHparams See CutOut EMAHparams See EMA FactorizeHparams See Factorize GhostBatchNormHparams See GhostBatchNorm LabelSmoothingHparams See LabelSmoothing LayerFreezingHparams See LayerFreezing MixUpHparams See MixUp NoOpModelHparams composer.algorithms.hparams.NoOpModelHparams ProgressiveResizingHparams See ProgressiveResizing RandAugmentHparams See RandAugment SAMHparams See SAM SWAHparams See SWA SelectiveBackpropHparams See SelectiveBackprop SeqLengthWarmupHparams composer.algorithms.hparams.SeqLengthWarmupHparams SqueezeExciteHparams See SqueezeExcite StochasticDepthHparams See StochasticDepth

Methods

• load_multiple()

• load()