Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling. - View it on GitHub
Star
0
Rank
13840946