Run evaluation on LLMs using human-eval benchmark - View it on GitHub
Star
345
Rank
92041