PyVideo.org · FlexAttention - The Flexibility of PyTorch + The Performance of FlashAttention

FlexAttention - The Flexibility of PyTorch + The Performance of FlashAttention

YouTube

Description

Introducing a novel abstraction leveraging the PyTorch compiler stack to enable custom, user-defined attention mechanisms. This new API supports dynamic modifications to attention scores within SDPA, providing both runtime and memory efficiency through kernel fusion with the FlashAttention algorithm.

Details

Event: PyTorch Conference 2024
Language: English
Media URL: YouTube
Tags: Lightning Talk
Related URLs:
- Conference Website

Improve this page