zhihu/TLLM_QMM - Gitstar Ranking

zhihu

Fetched on 2026/03/01 19:33

TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization. - View it on GitHub

Star

Rank

1175951

zhihu

zhihu / TLLM_QMM