This tutorial is available as a Jupyter notebook.

Open in Colab

ƒ() Functional API#

In this tutorial, we’ll see an example of using Composer’s algorithms in a standalone fashion with no changes to the surrounding code and no requirement to use the Composer trainer.

Tutorial Goals and Concepts Covered#

The key new concept introduced here is the functional API for algorithms. The goal of this tutorial is to provide some familiarity with its usage.

We’ll be training a simple model on CIFAR-10, similar to the PyTorch classifier tutorial. Because we’ll be using a toy model trained for only a few epochs, we won’t get the same speed or accuracy gains we might expect from a more realistic problem. However, this tutorial should still serve as a useful illustration of how to use various algorithms. For examples of more realistic results, see the MosaicML Explorer.

Install Composer#

If you don’t already have composer installed, install it:

[ ]:
%pip install mosaicml
# To install from source instead of the last release, comment the command above and uncomment the following one.
# %pip install git+https://github.com/mosaicml/composer.git

Define the Model, Dataloader, and Training Loop#

First, we need to define our original model, dataloader, and training loop. Let’s start with the dataloader:

[ ]:
import torch
import torch.utils.data
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

datadir = './data'
batch_size = 1024

transform = transforms.Compose(
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

trainset = torchvision.datasets.CIFAR10(root=datadir, train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root=datadir, train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

As you can see, we compose two transforms: one which converts the images to tensors and another that normalizes them. We apply these transformations to both the train and test sets.

Now, let’s define our model. We’re going to use a toy convolutional neural network so that the training finishes quickly.

[ ]:
class Net(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(3, 16, kernel_size=(3, 3), stride=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=(3, 3))
        self.norm = nn.BatchNorm2d(32)
        self.pool = nn.AdaptiveAvgPool2d(1)
        self.fc1 = nn.Linear(32, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.conv2(x)
        x = F.relu(self.norm(x))
        x = torch.flatten(self.pool(x), 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Finally, let’s write a simple training loop that prints the accuracy on the test set at the end of each epoch. We’ll just run a few epochs for brevity.

[ ]:
from tqdm.notebook import tqdm

num_epochs = 5

def train_and_eval(model, train_loader, test_loader):
    # Set up the model and optimizer
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model = model.to(device)
    opt = torch.optim.Adam(model.parameters())
    # Run one or more epochs
    for epoch in range(num_epochs):
        print(f"---- Beginning epoch {epoch} ----")
        progress_bar = tqdm(train_loader)
        # Train on an epoch of minibatches
        for X, y in progress_bar:
            X = X.to(device)
            y = y.to(device)
            y_hat = model(X)
            loss = F.cross_entropy(y_hat, y)
            progress_bar.set_postfix_str(f"train loss: {loss.item():.4f}")
        # Evaluate the model at the end of the epoch
        num_right = 0
        eval_size = 0
        for X, y in test_loader:
            X = X.to(device)
            y = y.to(device)
            y_hat = model(X)
            num_right += (y_hat.argmax(dim=1) == y).sum().item()
            eval_size += len(y)
        acc_percent = 100 * num_right / eval_size
        print(f"Epoch {epoch} validation accuracy: {acc_percent:.2f}%")

Great. Now, let’s instantiate this baseline model and see how it fares on our dataset.

[ ]:
model = Net()
train_and_eval(model, trainloader, testloader)

Now that we have this baseline, let’s add algorithms to improve our data pipeline and model. We’ll start by adding some data augmentation, accessed via cf.colout_batch. (We can ignore the details on how ColOut works for the sake of this tutorial; you can check out the docs if you’d like to learn more.)

[ ]:
import composer.functional as cf # <-- Imports Composer's functional API

# create dataloaders for the train and test sets
shared_transforms = [
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

# Add ColOut to the transforms used during training
train_transforms = shared_transforms[:] + [cf.colout_batch]

test_transform = transforms.Compose(shared_transforms)
train_transform = transforms.Compose(train_transforms)

trainset = torchvision.datasets.CIFAR10(root=datadir, train=True,
                                        download=True, transform=train_transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                        shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root=datadir, train=False,
                                        download=True, transform=test_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                          shuffle=False, num_workers=2)

Let’s see how our model does with just these changes.

[ ]:
model = Net()
# only use one data augmentation since our small model runs quickly
# and allows the dataloader little time to do anything fancy
train_and_eval(model, trainloader, testloader)

As we might expect, adding data augmentation doesn’t help us when we aren’t training long enough to start overfitting.

Let’s try using some algorithms that modify the model. We’re going to keep things simple and just add a Squeeze-and-Excitation module after the larger of the two Conv2d operations in our model. (Again, we can ignore what SqueezeExcite actually does, but feel free to check the docs to learn more.)

[ ]:
# squeeze-excite can add a lot of overhead for small
# conv2d operations, so only add it after convs with a
# minimum number of channels
cf.apply_squeeze_excite(model, latent_channels=64, min_channels=16)

Now let’s see how our model does with the above algorithm applied.

[ ]:
train_and_eval(model, trainloader, testloader)

Adding squeeze-excite gives us another few percentage points of accuracy and does so with little decrease in the number of iterations per second. Great!

Of course, this is a toy model and dataset, but it serves to illustrate how to use Composer’s algorithms inside your own training loops, with minimal changes to your code.

What next?#

You’ve now seen some examples of how to use our speed-up algorithms outside the Composer Trainer.

If you want to keep learning more, dig deeper into our functional API documentation, which includes a full list of available algorithm functions!

In addition, please continue to explore our tutorials! Here’s a couple suggestions:

Come get involved with MosaicML!#

We’d love for you to get involved with MosaicML community in any of these ways:

Star Composer on GitHub#

Stay up-to-date and help make others aware of our work by starring Composer on GitHub.

Join the MosaicML Slack#

Head on over to the MosaicML slack to join other ML efficiency enthusiasts. Come for the paper discussions, stay for the memes!

Contribute to Composer#

Is there a bug you noticed or a feature you’d like? File an issue or make a pull request!