Safety-hardened GEMM (matrix multiply) implementation achieving 169.8 GFLOPS on Intel i9-14900. Built for embedded systems and safety-critical applications where reliability matters as much as speed. 162× faster than naive, zero UB, fully validated. -
View it on GitHub