Use an NLP model (like a transformer or a fine-tuned LLM) to take human-readable optimization requests (e.g., "optimize this kernel for matrix multiplication with high occupancy") and automatically generate or transform CUDA kernels, then compile using LLVM for performance analysis. - View it on GitHub
Star
0
Rank
13855001