# Distributed training When using multiple GPUs, you will most probably want to fit using distributed training. Example use can be found: - In the form of a script with [examples/distrib.py](https://github.com/fastai/fastai/blob/master/nbs/examples/distrib.py) - Across all the App Examples with the [Notebook Launcher](https://docs.fast.ai/distributed_app_examples.html) - At the bottom of this notebook for more examples with `notebook_launcher`. To use distributed training, there are only three required steps: 1. Add `with learn.distrib_ctx():` before your `learn.fit` call 2. Either config Accelerate yourself by running `accelerate config` from the command line, or run: ``` python from accelerate.utils import write_basic_config write_basic_config() ``` 3. Run your training script with `accelerate launch scriptname.py ...args...` If you’re using [`untar_data`](https://docs.fast.ai/data.external.html#untar_data), or may be downloading or uncompressing data or models as part of your script, you should wrap that code with [`rank0_first`](https://docs.fast.ai/distributed.html#rank0_first), which forces that step to occur first just once on the master process, prior to the remaining processes running it in parallel. E.g. instead of: ``` python path = untar_data(URLs.IMAGEWOOF_320) ``` …you instead use: ``` python path = rank0_first(untar_data, URLs.IMAGEWOOF_320) ``` See below for details on the full API and underlying helper functions, if needed – however, note that you will not need anything except the above unless you need to change how the distributed training is implemented. ## Parallel ------------------------------------------------------------------------ source ### DataParallel.reset ``` python def reset( ): ``` *Patch required `reset` call into `DataParallel`* ------------------------------------------------------------------------ source ### ParallelTrainer ``` python def ParallelTrainer( device_ids ): ``` *Wrap a model `DataParallel` automatically* ------------------------------------------------------------------------ source ### Learner.to_parallel ``` python def to_parallel( device_ids:NoneType=None ): ``` *Add [`ParallelTrainer`](https://docs.fast.ai/distributed.html#paralleltrainer) callback to a [`Learner`](https://docs.fast.ai/learner.html#learner)* ------------------------------------------------------------------------ source ### Learner.detach_parallel ``` python def detach_parallel( ): ``` *Remove [`ParallelTrainer`](https://docs.fast.ai/distributed.html#paralleltrainer) callback from a Learner* ------------------------------------------------------------------------ ### parallel_ctx ``` python def parallel_ctx( device_ids:NoneType=None ): ``` *A context manager to adapt a learner to train in data parallel mode.* ## Distributed ### Helper functions ------------------------------------------------------------------------ source ### DistributedDataParallel.reset ``` python def reset( ): ``` *Patch required `reset` call into `DistributedDataParallel`* ------------------------------------------------------------------------ source ### setup_distrib ``` python def setup_distrib( gpu:NoneType=None ): ``` *Setup this process to participate in distributed training* ------------------------------------------------------------------------ source ### teardown_distrib ``` python def teardown_distrib( ): ``` *Free distributed training resources* ### DataLoader ------------------------------------------------------------------------ source ### DistributedDL ``` python def DistributedDL( dl, rank:NoneType=None, world_size:NoneType=None, device:NoneType=None ): ``` *A [`TfmdDL`](https://docs.fast.ai/data.core.html#tfmddl) which splits a batch into equal size pieces for each worker* ``` python dl = TfmdDL(list(range(50)), bs=12, num_workers=2) for i in range(4): dl1 = DistributedDL(dl, i, 4) test_eq(list(dl1), (torch.arange(i*13, i*13+12)%50,torch.tensor([i*13+12])%50)) ``` ------------------------------------------------------------------------ source ### DistributedTrainer ``` python def DistributedTrainer( sync_bn:bool=True, # Whether to replace all batch norm with `nn.SyncBatchNorm` device_placement:bool=True, split_batches:bool=