τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment - View it on GitHub
Star
0
Rank
13550310