Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality - View it on GitHub
Star
342
Rank
112488