This repository provides code and data associated with the paper entitled "Which Method(s) to Pick when Evaluating Large Language Models with Humans? -- A comparison of 6 methods." - View it on GitHub
Star
0
Rank
12125533