Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang. - View it on GitHub
Star
1
Rank
5279406