Tiny-R2: A hybrid architecture integrating SWA, CSA, HCA, mHC, and DSMoE under the DeepSeek V4 design paradigm, enabling single-GPU OPD post-training. - View it on GitHub
Star
14
Rank
1279267