FSDP2Config#

class composer.utils.FSDP2Config(device_mesh=None, reshard_after_forward=True, activation_checkpointing=False, activation_cpu_offload=False, state_dict_type='sharded', load_monolith_rank0_only=False, mixed_precision='DEFAULT', verbose=False)[source]#

Configuration for Fully Sharded Data Parallelism (FSDP2).

Parameters
  • device_mesh (Optional[DeviceMesh]) โ€“ The DeviceMesh for sharding. If None, a default 1D mesh is created. For 1D mesh, parameters are fully sharded across the mesh (FSDP). For 2D mesh, parameters are sharded across the 1st dimension and replicated across the 0th dimension (HSDP).

  • reshard_after_forward (Union[bool, int]) โ€“ Controls parameter behavior after forward.

  • activation_checkpointing (bool) โ€“ Whether to use activation checkpointing. Defaults to False.

  • activation_cpu_offload (bool) โ€“ Whether to use activation CPU offloading. Defaults to False.

  • state_dict_type (str) โ€“ Type of state dict to use. Can be โ€˜fullโ€™ or โ€˜shardedโ€™. Defaults to โ€˜shardedโ€™. - Note: In cases where load_path is not set in Trainer, state_dict_type indicates how a model will be saved. - Note: In cases where load_path is set in Trainer, state_dict_type indicates how a model will be loaded and also saved.

  • load_monolith_rank0_only (bool) โ€“ Whether to load monolithic checkpoints on rank 0 only. Defaults to False. - Note: when load_monolith_rank0_only is True and load_path is set in Trainer, state_dict_type must be โ€˜fullโ€™.

  • mixed_precision (str) โ€“ Mixed precision to use. Can be โ€˜DEFAULTโ€™, โ€˜PUREโ€™, or โ€˜FULLโ€™. Defaults to โ€˜DEFAULTโ€™.

  • verbose (bool) โ€“ Whether to print verbose output. Defaults to False.

classmethod from_compatible_attrs(attrs)[source]#

Create an FSDP2Config by filtering FSDP2 compatible attributes from given attrs.

Only attributes that are valid for FSDP2Config will be used, and warnings will be issued for any attributes that cannot be transferred. Therefore it supports both FSDP1 and FSDP2 attributes, and main use case is FSDP1 backwards compatibility.

Parameters

attrs (dict[str, Any]) โ€“ Dictionary of FSDP1/2 configuration attributes.

Returns

FSDP2Config โ€“ A new FSDP2Config instance with compatible attributes.

Warning

UserWarning: If an attribute in the input dictionary is not a settable attribute

of FSDP2Config and will be ignored.

classmethod settable_attrs()[source]#

Return a set of all settable attributes of FSDP2Config.