Large-language Model Evaluation framework with Elo Leaderboard and A-B testing - View it on GitHub
Star
50
Rank
434554