Run evaluation on LLMs using human-eval benchmark - View it on GitHub
Star
399
Rank
86892