# Distributed training


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

When using multiple GPUs, you will most probably want to fit using
distributed training.

Example use can be found:

- In the form of a script with
  [examples/distrib.py](https://github.com/fastai/fastai/blob/master/nbs/examples/distrib.py)
- Across all the App Examples with the [Notebook
  Launcher](https://docs.fast.ai/distributed_app_examples.html)
- At the bottom of this notebook for more examples with
  `notebook_launcher`.

To use distributed training, there are only three required steps:

1.  Add `with learn.distrib_ctx():` before your `learn.fit` call
2.  Either config Accelerate yourself by running `accelerate config`
    from the command line, or run:

``` python
from accelerate.utils import write_basic_config
write_basic_config()
```

3.  Run your training script with
    `accelerate launch scriptname.py ...args...`

If you’re using
[`untar_data`](https://docs.fast.ai/data.external.html#untar_data), or
may be downloading or uncompressing data or models as part of your
script, you should wrap that code with
[`rank0_first`](https://docs.fast.ai/distributed.html#rank0_first),
which forces that step to occur first just once on the master process,
prior to the remaining processes running it in parallel. E.g. instead
of:

``` python
path = untar_data(URLs.IMAGEWOOF_320)
```

…you instead use:

``` python
path = rank0_first(untar_data, URLs.IMAGEWOOF_320)
```

See below for details on the full API and underlying helper functions,
if needed – however, note that you will not need anything except the
above unless you need to change how the distributed training is
implemented.

## Parallel

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L19"
target="_blank" style="float:right; font-size:smaller">source</a>

### DataParallel.reset

``` python

def reset(
    
):

```

*Patch required `reset` call into `DataParallel`*

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L24"
target="_blank" style="float:right; font-size:smaller">source</a>

### ParallelTrainer

``` python

def ParallelTrainer(
    device_ids
):

```

*Wrap a model `DataParallel` automatically*

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L33"
target="_blank" style="float:right; font-size:smaller">source</a>

### Learner.to_parallel

``` python

def to_parallel(
    device_ids:NoneType=None
):

```

*Add
[`ParallelTrainer`](https://docs.fast.ai/distributed.html#paralleltrainer)
callback to a [`Learner`](https://docs.fast.ai/learner.html#learner)*

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L40"
target="_blank" style="float:right; font-size:smaller">source</a>

### Learner.detach_parallel

``` python

def detach_parallel(
    
):

```

*Remove
[`ParallelTrainer`](https://docs.fast.ai/distributed.html#paralleltrainer)
callback from a Learner*

------------------------------------------------------------------------

### parallel_ctx

``` python

def parallel_ctx(
    device_ids:NoneType=None
):

```

*A context manager to adapt a learner to train in data parallel mode.*

## Distributed

### Helper functions

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L57"
target="_blank" style="float:right; font-size:smaller">source</a>

### DistributedDataParallel.reset

``` python

def reset(
    
):

```

*Patch required `reset` call into `DistributedDataParallel`*

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L62"
target="_blank" style="float:right; font-size:smaller">source</a>

### setup_distrib

``` python

def setup_distrib(
    gpu:NoneType=None
):

```

*Setup this process to participate in distributed training*

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L71"
target="_blank" style="float:right; font-size:smaller">source</a>

### teardown_distrib

``` python

def teardown_distrib(
    
):

```

*Free distributed training resources*

### DataLoader

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L79"
target="_blank" style="float:right; font-size:smaller">source</a>

### DistributedDL

``` python

def DistributedDL(
    dl, rank:NoneType=None, world_size:NoneType=None, device:NoneType=None
):

```

*A [`TfmdDL`](https://docs.fast.ai/data.core.html#tfmddl) which splits a
batch into equal size pieces for each worker*

``` python
dl = TfmdDL(list(range(50)), bs=12, num_workers=2)
for i in range(4):
    dl1 = DistributedDL(dl, i, 4)
    test_eq(list(dl1), (torch.arange(i*13, i*13+12)%50,torch.tensor([i*13+12])%50))
```

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L140"
target="_blank" style="float:right; font-size:smaller">source</a>

### DistributedTrainer

``` python

def DistributedTrainer(
    sync_bn:bool=True, # Whether to replace all batch norm with `nn.SyncBatchNorm`
    device_placement:bool=True, split_batches:bool=<object object at 0x7fea7aa3c380>,
    gradient_accumulation_steps:int=1, cpu:bool=False, dataloader_config:DataLoaderConfiguration | None=None,
    deepspeed_plugin:DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None=None,
    fsdp_plugin:FullyShardedDataParallelPlugin | None=None,
    torch_tp_plugin:TorchTensorParallelPlugin | None=None, # Deprecate later, warning in `post_init`
    megatron_lm_plugin:MegatronLMPlugin | None=None, rng_types:list[str | RNGType] | None=None,
    project_dir:str | os.PathLike | None=None, project_config:ProjectConfiguration | None=None,
    gradient_accumulation_plugin:GradientAccumulationPlugin | None=None,
    kwargs_handlers:list[KwargsHandler] | None=None, dynamo_backend:DynamoBackend | str | None=None,
    dynamo_plugin:TorchDynamoPlugin | None=None,
    deepspeed_plugins:DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None=None,
    parallelism_config:ParallelismConfig | None=None
):

```

*Wrap `model` in `DistributedDataParallel` and `dls` in
[`DistributedDL`](https://docs.fast.ai/distributed.html#distributeddl)*

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L168"
target="_blank" style="float:right; font-size:smaller">source</a>

### Learner.to_distributed

``` python

def to_distributed(
    sync_bn:bool=True, # Whether to replace all batch norm with `nn.SyncBatchNorm`
    device_placement:bool=True, split_batches:bool=<object object at 0x7fea7aa3c380>,
    gradient_accumulation_steps:int=1, cpu:bool=False, dataloader_config:DataLoaderConfiguration | None=None,
    deepspeed_plugin:DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None=None,
    fsdp_plugin:FullyShardedDataParallelPlugin | None=None,
    torch_tp_plugin:TorchTensorParallelPlugin | None=None, # Deprecate later, warning in `post_init`
    megatron_lm_plugin:MegatronLMPlugin | None=None, rng_types:list[str | RNGType] | None=None,
    project_dir:str | os.PathLike | None=None, project_config:ProjectConfiguration | None=None,
    gradient_accumulation_plugin:GradientAccumulationPlugin | None=None,
    kwargs_handlers:list[KwargsHandler] | None=None, dynamo_backend:DynamoBackend | str | None=None,
    dynamo_plugin:TorchDynamoPlugin | None=None,
    deepspeed_plugins:DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None=None,
    parallelism_config:ParallelismConfig | None=None
):

```

*Add `AcceleratedTrainer` to a learner, and configures an Accelerator*

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L179"
target="_blank" style="float:right; font-size:smaller">source</a>

### Learner.detach_distributed

``` python

def detach_distributed(
    
):

```

*Remove
[`DistributedTrainer`](https://docs.fast.ai/distributed.html#distributedtrainer)
from a learner*

### `distrib_ctx` context manager

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L190"
target="_blank" style="float:right; font-size:smaller">source</a>

### Learner.distrib_ctx

``` python

def distrib_ctx(
    sync_bn:bool=True, # Whether to replace all batch norm with `nn.SyncBatchNorm`
    in_notebook:bool=False, # Whether we are launching from a notebook or not
    device_placement:bool=True, split_batches:bool=<object object at 0x7fea7aa3c380>,
    gradient_accumulation_steps:int=1, cpu:bool=False, dataloader_config:DataLoaderConfiguration | None=None,
    deepspeed_plugin:DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None=None,
    fsdp_plugin:FullyShardedDataParallelPlugin | None=None,
    torch_tp_plugin:TorchTensorParallelPlugin | None=None, # Deprecate later, warning in `post_init`
    megatron_lm_plugin:MegatronLMPlugin | None=None, rng_types:list[str | RNGType] | None=None,
    project_dir:str | os.PathLike | None=None, project_config:ProjectConfiguration | None=None,
    gradient_accumulation_plugin:GradientAccumulationPlugin | None=None,
    kwargs_handlers:list[KwargsHandler] | None=None, dynamo_backend:DynamoBackend | str | None=None,
    dynamo_plugin:TorchDynamoPlugin | None=None,
    deepspeed_plugins:DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None=None,
    parallelism_config:ParallelismConfig | None=None
):

```

*A context manager to adapt a learner to train in distributed data
parallel mode.*

`distrib_ctx` prepares a learner to train in distributed data parallel
mode. It assumes the script/code will either be ran through the command
line via `accelerate launch` or through the `notebook_launcher` function
from Accelerate. It also assumes that `accelerate` has been configured
through either running `write_basic_config()` or calling
`accelerate config` through the CLI and answering the prompts.

Typical usage:

    with learn.distrib_ctx(): learn.fit(.....)

It attaches a
[`DistributedTrainer`](https://docs.fast.ai/distributed.html#distributedtrainer)
callback and
[`DistributedDL`](https://docs.fast.ai/distributed.html#distributeddl)
data loader to the learner, then executes `learn.fit(.....)`. Upon
exiting the context, it removes the
[`DistributedTrainer`](https://docs.fast.ai/distributed.html#distributedtrainer)
and
[`DistributedDL`](https://docs.fast.ai/distributed.html#distributeddl),
and destroys any locally created distributed process group. The process
is still attached to the GPU though.

------------------------------------------------------------------------

<a
href="https://github.com/fastai/fastai/blob/main/fastai/distributed.py#L216"
target="_blank" style="float:right; font-size:smaller">source</a>

### rank0_first

``` python

def rank0_first(
    func, args:VAR_POSITIONAL, kwargs:VAR_KEYWORD
):

```

*Execute `func` in the Rank-0 process first, then in other ranks in
parallel.*

[`rank0_first`](https://docs.fast.ai/distributed.html#rank0_first) calls
`f()` in rank-0 process first, then in parallel on the rest, in
distributed training mode. In single process, non-distributed training
mode, `f()` is called only once as expected.

One application of
[`rank0_first()`](https://docs.fast.ai/distributed.html#rank0_first) is
to make fresh downloads via
[`untar_data`](https://docs.fast.ai/data.external.html#untar_data) safe
in distributed training scripts launched by
`python -m fastai.launch <script>`:

<code>path = untar_data(URLs.IMDB)</code>

becomes:

<code>path = rank0_first(lambda: untar_data(URLs.IMDB))</code>

Some learner factory methods may use
[`untar_data`](https://docs.fast.ai/data.external.html#untar_data) to
download pretrained models:

<code>learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5,
metrics=accuracy)</code>

becomes:

<code>learn = rank0_first(lambda: text_classifier_learner(dls, AWD_LSTM,
drop_mult=0.5, metrics=accuracy))</code>

Otherwise, multiple processes will download at the same time and corrupt
the data.

## Notebook Launcher

Accelerate provides a
[notebook_launcher](https://huggingface.co/docs/accelerate/launcher)
functionality to let you keep using your Jupyter Notebook as you would,
but train in a distributed setup!

First, make sure accelerate is properly configured. You can either run
`accelerate config` from the command line, or have an autofilled
configuration setup by running in the first cell of your notebook:

``` python
from accelerate.utils import write_basic_config
write_basic_config()
```

After Accelerate is configured, to utilize the `notebook_launcher`
functionality migrate your training into a function, and pass this to
`notebook_launcher`, such as:

``` python
---
from fastai.vision.all import *
from fastai.distributed import *

def train():
    set_seed(99, True)
    path = untar_data(URLs.PETS)/'images'
    dls = ImageDataLoaders.from_name_func(
        path, get_image_files(path), valid_pct=0.2,
        label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
    
    learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
    with learn.distrib_ctx(in_notebook=True):
        learn.fine_tune(1)
---
from accelerate import notebook_launcher
notebook_launcher(train, num_processes=2)
---
```
