thu-ml/SageAttention - Gitstar Ranking

thu-ml

Fetched on 2026/03/01 18:37

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models. - View it on GitHub

https://arxiv.org/abs/2410.02367

Star

3184

Rank

12690

thu-ml

thu-ml / SageAttention