DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning - View it on GitHub
Star
14
Rank
1187566