microsoft/RetrievalAttention

microsoft

Fetched on 2025/10/31 07:09

Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system. - View it on GitHub

Star

Rank

286806