RL finetuning method that ensures that the inference-time compute for queries is optimized based on query difficulty, leading to significant inference efficiency. - View it on GitHub
Star
1
Rank
5554649