🥡 Exporting for Inference#
Now that you’ve trained using Composer, do you need to make your model available for inference? We’ve got you covered.
Composer provides model export support for inference using a dedicated export API and a callback. In this tutorial, we walk through how to export your models into various common formats (e.g., ONNX and TorchScript) using the dedicated export API as well as Composer’s callback mechanism. Composer models can also be exported like any other PyTorch module since Composer models are torch.nn.Module
instances.
For more detailed options and configuration settings, please consult the linked documentation.
Recommended Background#
This tutorial assumes that you’re familiar with basic export formats like ONNX and TorchScript, and that you’re generally up to speed on using Composer for training. If you haven’t already, you may find it helpful to review our callback docs and checkpointing docs.
Tutorial Goals and Covered Concepts#
The goal of this tutorial is to showcase Composer’s export utilities for making a model available for inference.
We’ll touch on:
Let’s get started!
Prerequisites#
First, we install Composer:
[ ]:
%pip install mosaicml
# To install from source instead of the last release, comment the command above and uncomment the following one.
# %pip install git+https://github.com/mosaicml/composer.git
Create the Model#
To start, we create the model we’d like to export, which in this case is ResNet-50 with our SqueezeExcite
algorithm applied. This algorithm adds SqueezeExcite
modules after certain Conv2d
layers.
[ ]:
from torchvision.models import resnet
from composer.models import ComposerClassifier
import composer.functional as cf
model = ComposerClassifier(module=resnet.resnet50(), num_classes=1000)
cf.apply_squeeze_excite(model)
# switch to eval mode
model.eval()
Torchscript Export Using Standalone API#
Torchscript creates models from PyTorch code that can be saved and also optimized for deployment, and is the tooling is native to PyTorch.
The ComposerClassifier
’s forward method takes as input a pair of tensors (input, label)
, so we create dummy tensors to run the model.
[ ]:
import torch
input = (torch.rand(4, 3, 224, 224), torch.Tensor())
output = model(input)
Now we run export using our standalone export API. Composer also supports exporting to an object store such as S3. For more info on using an object store, please checkout our full documentation for the export_for_inference
API.
[ ]:
import os
import tempfile
from composer.utils import export_for_inference
save_format = 'torchscript'
working_dir = tempfile.TemporaryDirectory()
model_save_path = os.path.join(working_dir.name, 'model.pt')
export_for_inference(model=model,
save_format=save_format,
save_path=model_save_path)
Check to make sure that the model exists in our working directory.
[ ]:
print(os.listdir(path=working_dir.name))
Reload the saved model and run inference on it. We’ll also compare the results with the previously computed results on the same input as a sanity check.
[ ]:
scripted_model = torch.jit.load(model_save_path)
scripted_model.eval()
scripted_output = scripted_model(input)
print(torch.allclose(output, scripted_output))
Export Using a Callback#
The Composer trainer also lets you specify an export callback that automatically exports at the end of training. Since we will be training a model for a few epochs, we’ll first create a dataloader with CIFAR for this tutorial.
[ ]:
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
mnist_transforms = transforms.Compose([transforms.ToTensor()])
dataset = datasets.MNIST("./data", train=True, download=True, transform=mnist_transforms)
dataloader = DataLoader(dataset=dataset, batch_size=4)
input_mnist = (torch.rand(4, 1, 28, 28), torch.Tensor())
Create the Model#
We create the model we are training, which in this case is a ResNet-50.
[ ]:
import torch.nn as nn
import torch.nn.functional as F
from composer.models import ComposerClassifier
class ToyModel(nn.Module):
"""Toy convolutional neural network architecture in pytorch for MNIST."""
def __init__(self, num_classes: int = 10):
super().__init__()
self.num_classes = num_classes
self.conv1 = nn.Conv2d(1, 16, (3, 3), padding=0)
self.conv2 = nn.Conv2d(16, 32, (3, 3), padding=0)
self.bn = nn.BatchNorm2d(32)
self.fc1 = nn.Linear(32 * 16, 32)
self.fc2 = nn.Linear(32, num_classes)
def forward(self, x):
out = self.conv1(x)
out = F.relu(out)
out = self.conv2(out)
out = self.bn(out)
out = F.relu(out)
out = F.adaptive_avg_pool2d(out, (4, 4))
out = torch.flatten(out, 1, -1)
out = self.fc1(out)
out = F.relu(out)
return self.fc2(out)
model = ComposerClassifier(module=ToyModel(num_classes=10))
Create the Export Callback#
Now we create a callback that is used by the trainer to export the model for inference. Since we already saw torchscript export using Composer’s standalone export API, we are using onnx
as our export format for this section to showcase both capabilities. You can easily choose between these options by setting save_format
to whichever of 'onnx'
or 'torchscript'
you prefer.
Note: ONNX does not have a prebuilt wheel for Mac M1/M2 chips yet, so is not pip installable on recent Mac computers. Skip this section if your computer has an M1/M2 chip.
[ ]:
import composer.functional as cf
from composer.callbacks import ExportForInferenceCallback
# change to 'torchscript' for exporting to torchscript format
save_format = 'onnx'
model_save_path = os.path.join(working_dir.name, 'model1.onnx')
export_callback = ExportForInferenceCallback(save_format=save_format, save_path=model_save_path)
Run Training#
Now we construct the trainer using this callback. The model is exported at the end of the training. In the later part of this tutorail we show model exporting from a checkpoint, so we also supply trainer save_folder
and save_interval
to save some checkpoints.
[ ]:
import torch
from composer import Trainer
from composer.algorithms import SqueezeExcite
from composer.optim import DecoupledSGDW
optimizer = DecoupledSGDW(model.parameters(), lr=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5)
trainer = Trainer(
model=model,
train_dataloader=dataloader,
optimizers=optimizer,
schedulers=scheduler,
save_folder=working_dir.name,
algorithms=[SqueezeExcite()],
callbacks=[export_callback],
max_duration='2ep',
save_interval='1ep',
save_overwrite=True,
)
trainer.fit()
Let’s list the content of the working_dir
to check if the checkpoints and exported model is available.
[ ]:
print(os.listdir(path=working_dir.name))
Exporting from Trainer Directly#
[ ]:
model_save_path = os.path.join(working_dir.name, 'model2.onnx')
trainer.export_for_inference(save_format='onnx', save_path=model_save_path)
Similarly, let’s list the content of the working_dir
to see if this exported model is available.
[ ]:
print(os.listdir(path=working_dir.name))
Load and Run Exported ONNX Model#
[ ]:
%pip install onnx
%pip install onnxruntime
Let’s load the model and check that everything was exported properly.
[ ]:
import onnx
onnx_model = onnx.load(model_save_path)
onnx.checker.check_model(onnx_model)
Lastly, we can run inference with the model and check that the model indeed runs.
[ ]:
import onnxruntime as ort
import numpy as np
# run inference
ort_session = ort.InferenceSession(model_save_path, providers=['CPUExecutionProvider'])
outputs = ort_session.run(
None,
{'input': input_mnist[0].numpy()})
print(f"The predicted classes are {np.argmax(outputs[0], axis=1)}")
If our input is a dictionary, as if often the case when using a Composer HuggingFaceModel, we’ll need to make sure all the elements of our input dictionary are numpy arrays before calling ort_session.run()
.
Note: Since the model is randomly initialized, and the input tensor is random, the output classes in this example have no meaning.
Exporting from an Existing Checkpoint#
In this part of the tutorial, we will look at exporting a model from a previously created checkpoint that is stored locally. Composer also supports exporting from a checkpoint stored in an object store such as S3. Please checkout the full documentation for export_for_inference
API for using an object store.
Some of our algorithms alter the model architecture. For example, SqueezeExcite adds a channel-wise attention operator in CNNs and modifies the model architecure. Therefore, we need to provide a function that takes the mode and applies the algorithm before we can load the model weights from a checkpoint. The functional form of SqueezeExcite does exactly that, and we pass this function in the
surgery_algs
argument to the export_for_inference
API.
[ ]:
print(os.listdir(working_dir.name))
[ ]:
from composer.utils import export_for_inference
# We call it model2.onnx to make it different from our previous export
model_save_path = os.path.join(working_dir.name, 'model2.onnx')
checkpoint_path = os.path.join(working_dir.name, 'ep2-ba4-rank0.pt')
model = ComposerClassifier(module=ToyModel(num_classes=10))
export_for_inference(model=model,
save_format=save_format,
save_path=model_save_path,
sample_input=(input_mnist, {}),
surgery_algs=[cf.apply_squeeze_excite],
load_path=checkpoint_path)
Let us list the content of the working_dir to check if the newly exported model is available.
[ ]:
print(os.listdir(path=working_dir.name))
Make sure the model loaded from a checkpoint produces the same results as before
[ ]:
ort_session = ort.InferenceSession(model_save_path, providers=['CPUExecutionProvider'])
new_outputs = ort_session.run(
None,
{'input': input_mnist[0].numpy()},
)
print(np.allclose(outputs[0], new_outputs[0], atol=1e-07))
[ ]:
# Clean up working directory
working_dir.cleanup()
Torch.fx#
FX is a recent toolkit to transform PyTorch modules that allows for advanced graph manipulation and code generation capabilities. Eventually, PyTorch will add quantization and other optimization procedures on top of FX (e.g. see FX Graph Mode Quantization. Composer is also starting to add algorithms that use torch.fx
for graph optimization, so look forward to more of these in the future!
Tracing a model with torch.fx
is fairly straightforward:
[ ]:
traced_model = torch.fx.symbolic_trace(model)
Then, we can see all the nodes in the graph:
[ ]:
traced_model.graph.print_tabular()
And also run inference:
[ ]:
output = traced_model(input_mnist)
print(f"The predicted classes are {torch.argmax(output, dim=1)}")
torch.fx
is powerful, but one of the key limitations of this tool is that it does not support dynamic control flow (e.g. if
statements or loops that are data-dependant). Therefore, some algorithms, such as BlurPool, are currently not supported. We have ongoing work to bring torch.fx
support to all our algorithms.
Algorithm Compatibility#
Some of our algorithms alter the model architecture in ways that may render them incompatible with some of the export procedures above. For example, BlurPool replaces some instances of Conv2d
with BlurConv2d
layers which are not yet compatible with torch.fx
.
The following table shows which algorithms are compatible with which export formats for inference.
torchscript |
torch.fx |
ONNX |
|
---|---|---|---|
apply_blurpool |
✓ |
✓ |
|
apply_factorization |
✓ |
✓ |
|
apply_ghost_batchnorm |
✓ |
✓ |
|
apply_squeeze_excite |
✓ |
✓ |
✓ |
apply_stochastic_depth |
✓ |
✓ |
✓ |
apply_channels_last |
✓ |
✓ |
✓ |
What next?#
You’ve now seen all the ways that Composer enables you to make your trained models available for downstream inference.
To keep learning more, please continue to explore our tutorials! Here’s a suggestion:
Check out our beta support for training on TPUs.
Come get involved with MosaicML!#
We’d love for you to get involved with the MosaicML community in any of these ways:
Star Composer on GitHub#
Help make others aware of our work by starring Composer on GitHub.
Join the MosaicML Slack#
Head on over to the MosaicML slack to join other ML efficiency enthusiasts. Come for the paper discussions, stay for the memes!
Contribute to Composer#
Is there a bug you noticed or a feature you’d like? File an issue or make a pull request!