High-performance CPU GEMM kernels (C = A·Bᵀ + C) optimized for LLM inference, featuring AVX2/AVX-512 SIMD and multi-threading. Benchmarked against OpenBLAS. - View it on GitHub
Star
0
Rank
13789574