Official repo for paper "A Rubric-Supervised Critic from Sparse Real-World Outcomes". Type-safe function-calling-based LLM-as-judge evaluation framework for agent behavior prediction and analysis. - View it on GitHub
Star
0
Rank
13887920