CutMix
Image from CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features by Yun et al., 2019
Tags: Vision
, Increased Accuracy
, Increased GPU Usage
, Method
, Augmentation
, Regularization
TL;DR
CutMix trains the network on images from which a small patch has been cut out and replaced with a different image. Training in this fashion improves generalization performance.
Attribution
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features by Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Published in ICCV 2019.
Hyperparameters
alpha
- The parameter that controls the distribution that the area of the cut out region is drawn from when performing CutMix. This is a symmetric Beta distribution, meaning thatalpha
serves as both parameters for the Beta distribution. The actual area of the cut out region may differ from the sampled value, if the selected region is not entirely within the image.
Example Effects
CutMix is intended to improve generalization performance, and we empirically find this to be the case in our image classification settings. The original paper also reports improvements in object localization and robustness.
Implementation Details
The samples are created from a batch (X, y)
of (inputs, targets) together with version (X', y')
where the ordering of examples has been shuffled. The examples are combined by sampling a value lambda
(between 0.0 and 1.0) from the Beta distribution parameterized by alpha
, choosing a rectangular box within X
, filling it with the data from the corresponding region in X
. and training the network on the interpolation between (X, y)
and (X', y')
.
Note that the same lambda
and rectangular box are used for each example in the batch. Similar to MixUp, using the shuffled version of a batch to generate mixed samples allows CutMix to be used without loading additional data.
Suggested Hyperparameters
alpha = 1
Is a common choice.
Considerations
CutMix adds a little extra GPU compute and memory to create samples.
CutMix also requires a cost function that can accept dense target vectors, rather than an index of a corresponding 1-hot vector as is a common default (e.g., cross entropy with hard labels).
Composability
As general rule, combining regularization-based methods yields sublinear improvements to accuracy. This holds true for CutMix.
This method interacts with other methods (such as CutOut) that alter the inputs or the targets (such as label smoothing). While such methods often still compose well with CutMix in terms of improved accuracy, it is important to ensure that the implementations of these methods compose.
Code
- class composer.algorithms.cutmix.CutMix(alpha: float)[source]
CutMix trains the network on non-overlapping combinations of pairs of examples and iterpolated targets rather than individual examples and targets.
This is done by taking a non-overlapping combination of a given batch X with a randomly permuted copy of X. The area is drawn from a
Beta(alpha, alpha)
distribution.Training in this fashion reduces generalization error.
- Parameters
alpha – the psuedocount for the Beta distribution used to sample area parameters. As
alpha
grows, the two samples in each pair tend to be weighted more equally. Asalpha
approaches 0 from above, the combination approaches only using one element of the pair.
- composer.algorithms.cutmix.cutmix.gen_indices(x: Tensor) Tensor [source]
Generates indices of a random permutation of elements of a batch.
- Parameters
x – input tensor of shape (B, d1, d2, …, dn), B is batch size, d1-dn are feature dimensions.
- Returns
indices – A random permutation of the batch indices.
- composer.algorithms.cutmix.cutmix.gen_cutmix_lambda(alpha: float) float [source]
Generates lambda from
Beta(alpha, alpha)
- Parameters
alpha – Parameter for the Beta(alpha, alpha) distribution
- Returns
cutmix_lambda – Lambda parameter for performing cutmix.
- composer.algorithms.cutmix.cutmix.rand_bbox(W: int, H: int, cutmix_lambda: float, cx: Optional[int] = None, cy: Optional[int] = None) Tuple[int, int, int, int] [source]
Randomly samples a bounding box with area determined by cutmix_lambda.
Adapted from original implementation https://github.com/clovaai/CutMix-PyTorch
- Parameters
W – Width of the image
H – Height of the image
cutmix_lambda – Lambda param from cutmix, used to set the area of the box.
cx – Optional x coordinate of the center of the box.
cy – Optional y coordinate of the center of the box.
- Returns
bbx1 – Leftmost edge of the bounding box
bby1 – Top edge of the bounding box
bbx2 – Rightmost edge of the bounding box
bby2 – Bottom edge of the bounding box
- composer.algorithms.cutmix.cutmix.adjust_lambda(cutmix_lambda: float, x: Tensor, bbox: Tuple) float [source]
Rescale the cutmix lambda according to the size of the clipped bounding box
- Parameters
cutmix_lambda – Lambda param from cutmix, used to set the area of the box.
x – input tensor of shape (B, d1, d2, …, dn), B is batch size, d1-dn are feature dimensions.
bbox – (x1, y1, x2, y2) coordinates of the boundind box, obeying x2 > x1, y2 > y1.
- Returns
adjusted_lambda – Rescaled cutmix_lambda to account for part of the bounding box being potentially out of bounds of the input.
- composer.algorithms.cutmix.cutmix(x: Tensor, y: Tensor, alpha: float, n_classes: int, cutmix_lambda: Optional[float] = None, bbox: Optional[Tuple] = None, indices: Optional[Tensor] = None) Tuple[Tensor, Tensor] [source]
Create new samples using combinations of pairs of samples.
This is done by masking a region of x, and filling the masked region with a permuted copy of x. The cutmix parameter lambda should be chosen from a
Beta(alpha, alpha)
distribution for some parameter alpha > 0. The area of the masked region is determined by lambda, and so labels are interpolated accordingly. Note that the same lambda is used for all examples within the batch. The original paper used a fixed value of alpha = 1.Both the original and shuffled labels are returned. This is done because for many loss functions (such as cross entropy) the targets are given as indices, so interpolation must be handled separately.
- Parameters
x – input tensor of shape (B, d1, d2, …, dn), B is batch size, d1-dn are feature dimensions.
y – target tensor of shape (B, f1, f2, …, fm), B is batch size, f1-fn are possible target dimensions.
alpha – parameter for the beta distribution of the cutmix region size.
n_classes – total number of classes.
cutmix_lambda – optional, fixed size of cutmix region.
bbox – optional, predetermined (rx1, ry1, rx2, ry2) coords of the bounding box.
indices – Permutation of the batch indices 1..B. Used for permuting without randomness.
- Returns
x_cutmix – batch of inputs after cutmix has been applied.
y_cutmix – labels after cutmix has been applied.
Example
from composer import functional as CF
- for X, y in dataloader:
X, y, _, _ ,_ = CF.cutmix(X, y, alpha, nclasses)
pred = model(X) loss = loss_fun(pred, y) # loss_fun must accept dense labels (ie NOT indices)