DDPSyncStrategy#
- class composer.distributed.DDPSyncStrategy(value)[source]#
How and when gradient synchronization should happen.
- SINGLE_AUTO_SYNC#
The default behavior. Gradients are synchronized as they computed, for only the final microbatch of a batch. This is the most efficient strategy, but can lead to errors when
find_unused_parameters
is set, since it is possible different microbatches may use different sets of parameters, leading to an incomplete sync.
- MULTI_AUTO_SYNC#
The default behavior when
find_unused_parameters
is set. Gradients are synchronized as they are computed for all microbatches. This ensures complete synchronization, but is less efficient thanSINGLE_AUTO_SYNC
. This efficiency gap is usually small, as long as either DDP syncs are a small portion of the trainerโs overall runtime, or the number of microbatches per batch is relatively small.
- FORCED_SYNC#
Gradients are manually synchronized only after all gradients have been computed for the final microbatch of a batch. Like
MULTI_AUTO_SYNC
, this strategy ensures complete gradient synchronization, but this tends to be slower thanMULTI_AUTO_SYNC
. This is because ordinarily syncs can happen in parallel with theloss.backward()
computation, meaning syncs can be mostly complete by the time that function finishes. However, in certain circumstances, syncs may take a very long time to complete - if there are also a lot of microbatches per batch, this strategy may be optimal.