Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM. - View it on GitHub
Star
47
Rank
454704