A tool for testing and understanding the reliability of LLM Agents. This tool evaluates agents on two key dimensions: 1. Visibility: How well the agent explains what it's doing 2. Repeatability: How consistent the agent's responses are -
View it on GitHub