Contribute Media
A thank you to everyone who makes this possible: Read More

Torchtitan: Large-Scale LLM Training Using Native PyTorch 3D Parallelism

Description

torchtitan is a proof-of-concept for Large-scale LLM training using native PyTorch. It is a repo that showcases PyTorch's latest distributed training features in a clean, minimal codebase. We show-cased end to end large scale training features enablement: 1. 3D/4D Parallelism 2. Efficient distributed checkpoint save/load/resharding 3. Many efficient training techniques including Float8, torch.compile, activation checkpoint, etc.

Details

Improve this page