A flexible, extensible, and reproducible framework for evaluating LLM workflows, applications, retrieval-augmented generation pipelines, and standalone models across custom and standard datasets. - View it on GitHub
Star
12
Rank
1366385