L1aoXingyu/marlin - Gitstar Ranking

L1aoXingyu

Fetched on 2025/03/15 09:10

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens. - View it on GitHub

Star

Rank

12738234