gmh5225/turboquant-pytorch - Gitstar Ranking

gmh5225

Fetched on 2026/07/13 21:13

From-scratch PyTorch implementation of Google's TurboQuant (ICLR 2026) for LLM KV cache compression. 5x compression at 3-bit with 99.5% attention fidelity. - View it on GitHub

Star

Rank

14120501

gmh5225

gmh5225 / turboquant-pytorch