RL finetuning method that ensures that the inference-time compute for queries is optimized based on query difficulty, leading to significant inference efficiency. - View it on GitHub
Star
3
Rank
3127134