AutoParallel is a PyTorch library that automatically shards and parallelizes models for distributed training. Given a model and a device mesh, it uses linear programming to find an optimal sharding strategy (FSDP, tensor parallelism, or a mix) and applies it — no manual parallelism code required.
Early Development Warning — AutoParallel is experimental. Expect bugs, incomplete features, and APIs that may change. Bugfixes are welcome; please discuss significant changes in the issue tracker before starting work.
- Python >= 3.10
- PyTorch nightly (newer than 2.10)
# Via SSH
pip install git+ssh://git@github.com/pytorch-labs/autoparallel.git
# Via HTTPS
pip install git+https://github.com/pytorch-labs/autoparallel.gitThe simplest way to try AutoParallel is with a HuggingFace model. This runs entirely on a single machine using a fake process group — no multi-GPU setup needed:
pip install transformers
python examples/example_hf.py --model gpt2 --mesh 8You should see log output ending with Forward + backward OK.
For more examples (LLaMA-3, pipeline parallelism, distributed checkpointing),
see the examples/ directory.
cd autoparallel
pip install -e .Modified Python files are reflected immediately.
Run linters before submitting a PR:
pip install pre-commit
pre-commit run --all-filesRun tests (requires a CUDA GPU):
pip install -r requirements-test.txt
pytest tests/AutoParallel is BSD-3 licensed, as found in the LICENSE file.