A tool for testing and understanding the reliability of LLM Agents. This tool evaluates agents on two key dimensions: 1. Visibility: How well the agent explains what it's doing 2. Repeatability: How consistent the agent's responses are - View it on GitHub
Star
0
Rank
12458271