CutMix

Image from CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features by Yun et al., 2019

Tags: Vision, Increased Accuracy, Increased GPU Usage, Method, Augmentation, Regularization

TL;DR

CutMix trains the network on images from which a small patch has been cut out and replaced with a different image. Training in this fashion improves generalization performance.

Attribution

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features by Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Published in ICCV 2019.

Hyperparameters

alpha - The parameter that controls the distribution that the area of the cut out region is drawn from when performing CutMix. This is a symmetric Beta distribution, meaning that alpha serves as both parameters for the Beta distribution. The actual area of the cut out region may differ from the sampled value, if the selected region is not entirely within the image.

Example Effects

CutMix is intended to improve generalization performance, and we empirically find this to be the case in our image classification settings. The original paper also reports improvements in object localization and robustness.

Implementation Details

The samples are created from a batch (X, y) of (inputs, targets) together with version (X', y') where the ordering of examples has been shuffled. The examples are combined by sampling a value lambda (between 0.0 and 1.0) from the Beta distribution parameterized by alpha, choosing a rectangular box within X, filling it with the data from the corresponding region in X. and training the network on the interpolation between (X, y) and (X', y').

Note that the same lambda and rectangular box are used for each example in the batch. Similar to MixUp, using the shuffled version of a batch to generate mixed samples allows CutMix to be used without loading additional data.

Suggested Hyperparameters

alpha = 1 Is a common choice.

Considerations

CutMix adds a little extra GPU compute and memory to create samples.
CutMix also requires a cost function that can accept dense target vectors, rather than an index of a corresponding 1-hot vector as is a common default (e.g., cross entropy with hard labels).

Composability

As general rule, combining regularization-based methods yields sublinear improvements to accuracy. This holds true for CutMix.

This method interacts with other methods (such as CutOut) that alter the inputs or the targets (such as label smoothing). While such methods often still compose well with CutMix in terms of improved accuracy, it is important to ensure that the implementations of these methods compose.

Code

class composer.algorithms.cutmix.CutMix(alpha: float)[source]

CutMix trains the network on non-overlapping combinations of pairs of examples and iterpolated targets rather than individual examples and targets.

This is done by taking a non-overlapping combination of a given batch X with a randomly permuted copy of X. The area is drawn from a Beta(alpha, alpha) distribution.

Training in this fashion reduces generalization error.

Parameters: alpha – the psuedocount for the Beta distribution used to sample area parameters. As alpha grows, the two samples in each pair tend to be weighted more equally. As alpha approaches 0 from above, the combination approaches only using one element of the pair.

apply(event: Event, state: State, logger: Logger) → None[source]

Applies CutMix augmentation on State input

Parameters

event (Event) – the current event
state (State) – the current trainer state
logger (Logger) – the training logger

match(event: Event, state: State) → bool[source]

Runs on Event.INIT and Event.AFTER_DATALOADER

Parameters

event (Event) – The current event.
state (State) – The current state.

Returns

bool – True if this algorithm should run now.

composer.algorithms.cutmix.cutmix.gen_indices(x: Tensor) → Tensor[source]

Generates indices of a random permutation of elements of a batch.

Parameters: x – input tensor of shape (B, d1, d2, …, dn), B is batch size, d1-dn are feature dimensions.
Returns: indices – A random permutation of the batch indices.

composer.algorithms.cutmix.cutmix.gen_cutmix_lambda(alpha: float) → float[source]

Generates lambda from Beta(alpha, alpha)

Parameters: alpha – Parameter for the Beta(alpha, alpha) distribution
Returns: cutmix_lambda – Lambda parameter for performing cutmix.

composer.algorithms.cutmix.cutmix.rand_bbox(W: int, H: int, cutmix_lambda: float, cx: Optional[int] = None, cy: Optional[int] = None) → Tuple[int, int, int, int][source]

Randomly samples a bounding box with area determined by cutmix_lambda.

Adapted from original implementation https://github.com/clovaai/CutMix-PyTorch

Parameters

W – Width of the image
H – Height of the image
cutmix_lambda – Lambda param from cutmix, used to set the area of the box.
cx – Optional x coordinate of the center of the box.
cy – Optional y coordinate of the center of the box.

Returns

bbx1 – Leftmost edge of the bounding box
bby1 – Top edge of the bounding box
bbx2 – Rightmost edge of the bounding box
bby2 – Bottom edge of the bounding box

composer.algorithms.cutmix.cutmix.adjust_lambda(cutmix_lambda: float, x: Tensor, bbox: Tuple) → float[source]

Rescale the cutmix lambda according to the size of the clipped bounding box

Parameters

cutmix_lambda – Lambda param from cutmix, used to set the area of the box.
x – input tensor of shape (B, d1, d2, …, dn), B is batch size, d1-dn are feature dimensions.
bbox – (x1, y1, x2, y2) coordinates of the boundind box, obeying x2 > x1, y2 > y1.

Returns

adjusted_lambda – Rescaled cutmix_lambda to account for part of the bounding box being potentially out of bounds of the input.

composer.algorithms.cutmix.cutmix(x: Tensor, y: Tensor, alpha: float, n_classes: int, cutmix_lambda: Optional[float] = None, bbox: Optional[Tuple] = None, indices: Optional[Tensor] = None) → Tuple[Tensor, Tensor][source]

Create new samples using combinations of pairs of samples.

This is done by masking a region of x, and filling the masked region with a permuted copy of x. The cutmix parameter lambda should be chosen from a Beta(alpha, alpha) distribution for some parameter alpha > 0. The area of the masked region is determined by lambda, and so labels are interpolated accordingly. Note that the same lambda is used for all examples within the batch. The original paper used a fixed value of alpha = 1.

Both the original and shuffled labels are returned. This is done because for many loss functions (such as cross entropy) the targets are given as indices, so interpolation must be handled separately.

Parameters

x – input tensor of shape (B, d1, d2, …, dn), B is batch size, d1-dn are feature dimensions.
y – target tensor of shape (B, f1, f2, …, fm), B is batch size, f1-fn are possible target dimensions.
alpha – parameter for the beta distribution of the cutmix region size.
n_classes – total number of classes.
cutmix_lambda – optional, fixed size of cutmix region.
bbox – optional, predetermined (rx1, ry1, rx2, ry2) coords of the bounding box.
indices – Permutation of the batch indices 1..B. Used for permuting without randomness.

Returns

x_cutmix – batch of inputs after cutmix has been applied.
y_cutmix – labels after cutmix has been applied.

Example

from composer import functional as CF

for X, y in dataloader:

X, y, _, _ ,_ = CF.cutmix(X, y, alpha, nclasses)

pred = model(X) loss = loss_fun(pred, y) # loss_fun must accept dense labels (ie NOT indices)