Hybrid LLM runtime — minimal VRAM, always-on GPU prefill, optimised CPU inference - View it on GitHub
Star
0
Rank
13857361