An 8×8 TPU-style systolic array accelerator implemented in Verilog and deployed on FPGA. This version includes my own throughput improvements through pipeline optimization and better PE dataflow scheduling. Project includes full testbenches, simulation waveforms, FPGA bitstream, and resource utilization reports. -
View it on GitHub