Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality - View it on GitHub
Star
346
Rank
111829