Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
microsoft
Fetched on 2026/04/14 13:27
microsoft
/
RetrievalAttention
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system. -
View it on GitHub
https://aka.ms/RetroInfer
Star
135
Rank
241816