Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning. - View it on GitHub
Star
161
Rank
195212