τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment - View it on GitHub
Star
0
Rank
12658935