Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
microsoft
Fetched on 2025/06/02 01:56
microsoft
/
RetrievalAttention
Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system. -
View it on GitHub
https://aka.ms/RetroInfer
Star
44
Rank
503379