Contribute Media
A thank you to everyone who makes this possible: Read More

Keynote: How to Leverage PyTorch to Scale AI Training and Inferencing

Description

As generative AI models grow larger and more complex, the ability to scale these models becomes a critical challenge facing enterprises today. How can developers leverage PyTorch to maximize the value of these large, multi-billion parameter models to make them run faster, more efficiently, and more affordably both on-prem and in the cloud? This keynote will highlight various levers that PyTorch FSDP provides to scale AI model training on hundreds of GPUs and how IBM applied them to obtain state-of-the-art training throughput in models with up to 70 billion parameters. It will also discuss how we combined the latest advancements in PyTorch compile with custom tensor parallel implementation to achieve significantly reduced inferencing latency.

Details

Improve this page