Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM. Export your models effortlessly toautogptq, autoawq, gguf and autoround formats with higher accuracy even at extremely low bit precision. - View it on GitHub
Star
0
Rank
13908001