gmh5225/pseudo_profiling_LLM

gmh5225

Fetched on 2026/07/13 21:13

Tiny pseudo-profiling python script that estimates KV cache memory and a rough latency budget for sizing a deployment. (inputs: context length, target tokens, batch size, layers/heads/dim, dtype) - View it on GitHub

Star

Rank

14124007

gmh5225

gmh5225 / pseudo_profiling_LLM