parallelize_composer_model#

composer.distributed.parallelize_composer_model(composer_model, optimizer, config)[source]#

Prepare a ComposerModel for distributed training.

NOTE we apply parallelization to each of the composer modelโ€™s submodules to provide compatibility with models defined for FSDP1. This is not strictly necessary for FSDP2 as it relies on DTensor so even if a module is not wrapped with FSDP2 and its params are sharded, it is still functional (but potentially less performant due to lack of grouped prefetching etc).

For advanced users who want to have access to more flexible fsdp_wrap_policy or activation_checkpointing_check_fn, they should use parallelize_model directly.

Parameters
  • composer_model (ComposerModel) โ€“ The ComposerModel to prepare for distributed training.

  • optimizer (Optional[Optimizer]) โ€“ The optimizer to use for distributed training.

  • config (FSDP2Config) โ€“ The configuration for distributed training. Currently only FSDP2Config is supported.