τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment - View it on GitHub
Star
5
Rank
2325225