microsoft/MInference - Gitstar Ranking

microsoft

Fetched on 2025/08/25 07:12

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. - View it on GitHub

https://aka.ms/MInference

Star

1107

Rank

34964

microsoft

microsoft / MInference