Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core. - View it on GitHub
Star
0
Rank
13799253