Designing and implementing a scalable systolic-array neural-network accelerator (4×4 PE array, tile-based mat-mul) in SystemVerilog; verified with a Python golden model and SystemVerilog testbench — achieving functional parity and demonstrate tiling to 64×64 matrices. -
View it on GitHub