τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment - View it on GitHub
Star
4
Rank
2577405