Examples
Single-node training¶
TRL
Fine-tune Llama 3.1 8B on a custom dataset using TRL.
Axolotl
Fine-tune Llama 4 on a custom dataset using Axolotl.
Distributed training¶
TRL
Fine-tune LLM on multiple nodes with TRL, Accelerate, and Deepspeed.
Axolotl
Fine-tune LLM on multiple nodes with Axolotl.
Ray+RAGEN
Fine-tune an agent on multiple nodes with RAGEN, verl, and Ray.
Clusters¶
GCP
Set up GCP A4 and A3 clusters with optimized networking
AWS
Set up AWS EFA clusters with optimized networking
Lambda
Set up Lambda clusters with optimized networking
Crusoe
Set up Crusoe clusters with optimized networking
NCCL/RCCL tests
Run multi-node NCCL tests with MPI
Inference¶
SGLang
Deploy DeepSeek distilled models with SGLang
vLLM
Deploy Llama 3.1 with vLLM
TGI
Deploy Llama 4 with TGI
NIM
Deploy a DeepSeek distilled model with NIM
TensorRT-LLM
Deploy DeepSeek models with TensorRT-LLM
Accelerators¶
AMD
Deploy and fine-tune LLMs on AMD
TPU
Deploy and fine-tune LLMs on TPU
Intel Gaudi
Deploy and fine-tune LLMs on Intel Gaudi
Tenstorrent
Deploy and fine-tune LLMs on Tenstorrent