This tutorial is available as a Jupyter notebook.

Open in Colab

♻️ Auto Grad Accum#

Have you ever wanted to choose your batch size without having to stress about CUDA Out-of-Memory (OOM) errors? We sure have. That’s why we built Composer’s automatic gradient accumulation feature.

This tutorial will demonstrate how to use automatic gradient accumulation to avoid CUDA OOMs, regardless of your batch size choice, GPU type, and number of devices.

Note that this demo requires a GPU to demonstrate automatic gradient accumulation.

Tutorial Goals and Concepts Covered#

The goal of this tutorial is to show you how to turn on automatic gradient accumulation and to provide a sandbox to play around with it a bit. Please feel free to experiment with different batch sizes and other configuration choices to see how it works!

For details of the implementation, see our Auto Grad Accum documentation.

Let’s get started!

Set Up Our Workspace#

We’ll start by installing Composer:

[ ]:
%pip install mosaicml
# To install from source instead of the last release, comment the command above and uncomment the following one.
# %pip install git+https://github.com/mosaicml/composer.git

We are going to use the CIFAR-10 dataset with a ResNet-56 model and some standard optimization settings. For the purposes of this tutorial, we’ll choose a very large batch size and increase the image size to 96x96. These settings will cause CUDA Out-of-Memory errors on most GPUs.

[ ]:
import torch

import composer
from torchvision import datasets, transforms

torch.manual_seed(42) # For replicability

data_directory = "./data"

# Normalization constants
mean = (0.507, 0.487, 0.441)
std = (0.267, 0.256, 0.276)

# choose a very large batch size
batch_size = 2048

cifar10_transforms = transforms.Compose([
  transforms.Normalize(mean, std),
  transforms.Resize(size=[96, 96])  # choose a large image size

train_dataset = datasets.CIFAR10(data_directory, train=True, download=True, transform=cifar10_transforms)
test_dataset = datasets.CIFAR10(data_directory, train=False, download=True, transform=cifar10_transforms)

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=True)
[ ]:
from composer import models
model = models.composer_resnet_cifar(model_name='resnet_56', num_classes=10)

optimizer = composer.optim.DecoupledSGDW(
    model.parameters(), # Model parameters to update

Train a Baseline Model#

Now we run our trainer code with the grad_accum='auto' setting.

[ ]:
assert torch.cuda.is_available(), "Demonstrating automatic gradient accumulation requires a GPU."

trainer = composer.trainer.Trainer(
    grad_accum='auto',  # <--- Activate Composer magic!

# Train

Depending on your GPU type, you should see some logs that increase the gradient accumulation dynamically until the model fits into memory, prior to the start of training—e.g., something like:

INFO:composer.trainer.trainer:CUDA out of memory detected.
Gradient Accumulation increased from 1 -> 2, and the batch
will be retrained.

Worry not! This just means everything is working as expected. With automatic gradient accumulation enabled, Composer responds to OOM errors during training by doubling the accumulation rate. Under the hood, each minibatch is split into n “microbatches”, where n is the accumulation rate, and gradients are accumulated across microbatches before stepping the optimizer. So, you should expect to see the accumulation rate increase until the resulting microbatch size fits on the device. This lets you focus on getting the best minibatch size without having to stress about what your hardware can handle.

What next?#

You’ve now seen how to turn on automatic gradient accumulation using the Composer trainer.

To dig deeper, see our Auto Grad Accum documentation.

In addition, please continue to explore our tutorials! Here’s a couple suggestions:

Come get involved with MosaicML!#

We’d love for you to get involved with the MosaicML community in any of these ways:

Star Composer on GitHub#

Help make others aware of our work by starring Composer on GitHub.

Join the MosaicML Slack#

Head on over to the MosaicML slack to join other ML efficiency enthusiasts. Come for the paper discussions, stay for the memes!

Contribute to Composer#

Is there a bug you noticed or a feature you’d like? File an issue or make a pull request!