A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models - View it on GitHub
Star
510
Rank
60261