Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning. - View it on GitHub
Star
164
Rank
204301