Contribute Media
A thank you to everyone who makes this possible: Read More

FlexAttention - The Flexibility of PyTorch + The Performance of FlashAttention

Description

Introducing a novel abstraction leveraging the PyTorch compiler stack to enable custom, user-defined attention mechanisms. This new API supports dynamic modifications to attention scores within SDPA, providing both runtime and memory efficiency through kernel fusion with the FlashAttention algorithm.

Details

Improve this page