Open-source PyTorch implementation of Google TurboQuant (ICLR 2026) — extreme KV-cache quantization to ~3 bits with zero accuracy loss. 6x less memory, up to 8x faster inference. - View it on GitHub
Star
0
Rank
13993518