Description
As generative AI models grow larger and more complex, the ability to scale these models becomes a critical challenge facing enterprises today. How can developers leverage PyTorch to maximize the value of these large, multi-billion parameter models to make them run faster, more efficiently, and more affordably both on-prem and in the cloud? This keynote will highlight various levers that PyTorch FSDP provides to scale AI model training on hundreds of GPUs and how IBM applied them to obtain state-of-the-art training throughput in models with up to 70 billion parameters. It will also discuss how we combined the latest advancements in PyTorch compile with custom tensor parallel implementation to achieve significantly reduced inferencing latency.