๐Ÿ™๏ธ ResNet#

[How to Use] ยท [Architecture] ยท [Family Members] ยท [Default Training Hyperparameters] ยท [Attribution] ยท [API Reference]

Vision / Image Classification

The ResNet model family is a set of convolutional neural networks that can be used as a basis for a variety of vision tasks. Our implementation is a simple wrapper on top of the torchvision ResNet implementation.

How to Use#

from composer.models import composer_resnet

model = composer_resnet(
    model_name="resnet50",
    num_classes=1000,
    pretrained=False
)

Architecture#

The basic architecture defined in the original papers is as follows:

  • The first layer is a 7x7 Convolution with stride 2 and 64 filters.

  • Subsequent layers follow 4 stages with {64, 128, 256, 512} input channels with a varying number of residual blocks at each stage that depends on the family member. At the end of every stage, the resolution is reduced by half using a convolution with stride 2.

  • The final section consists of a global average pooling followed by a linear + softmax layer that outputs values for the specified number of classes.

The below table from He et al. details some of the building blocks for ResNets of different sizes.

resnet.png

Family Members#

ResNet family members are identified by their number of layers. Parameter count, accuracy, and training time are provided below.

Model Family Members

Parameter Count

Our Accuracy

Training Time on 8xA100s

ResNet-18

11.5M

TBA

TBA

ResNet-34

21.8M

TBA

TBA

ResNet-50

25.6M

76.5%

3.83 hrs

ResNet-101

44.5M

78.1%

5.50 hrs

ResNet-152

60.2M

TBA

TBA

โ— Note: Please see the CIFAR ResNet model card for the differences between CIFAR and ImageNet ResNets.

Default Training Hyperparameters#

optimizer:
  sgd:
    learning_rate: 2.048
    momentum: 0.875
    weight_decay: 5e-4
lr_schedulers:
  linear_warmup: "8ep"
  cosine_decay:
      T_max: "82ep"
      eta_min: 0
      verbose: false
      interval: step
train_batch_size: 2048
max_duration: 90ep

Attribution#

Paper: Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Code and hyperparameters: DeepLearningExamples Github repository by Nvidia

API Reference#

composer.models.resnet.model.composer_resnet(model_name, num_classes=1000, pretrained=False, groups=1, width_per_group=64, initializers=None, loss_name='soft_cross_entropy')[source]

Helper function to create a ComposerClassifier with a torchvision ResNet model.

From Deep Residual Learning for Image Recognition (He et al, 2015).

Parameters
  • model_name (str) โ€“ Name of the ResNet model instance. Either ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152"].

  • num_classes (int, optional) โ€“ The number of classes. Needed for classification tasks. Default: 1000.

  • pretrained (bool, optional) โ€“ If True, use ImageNet pretrained weights. Default: False.

  • groups (int, optional) โ€“ Number of filter groups for the 3x3 convolution layer in bottleneck blocks. Default: 1.

  • width_per_group (int, optional) โ€“ Initial width for each convolution group. Width doubles after each stage. Default: 64.

  • initializers (List[Initializer], optional) โ€“ Initializers for the model. None for no initialization. Default: None.

  • loss_name (str, optional) โ€“ Loss function to use. E.g. โ€˜soft_cross_entropyโ€™ or โ€˜binary_cross_entropy_with_logitsโ€™. Loss function must be in loss. Default: 'soft_cross_entropy'โ€.

Returns

ComposerModel โ€“ instance of ComposerClassifier with a torchvision ResNet model.

Example:

from composer.models import composer_resnet

model = composer_resnet(model_name='resnet18')  # creates a torchvision resnet18 for image classification