Code for the paper 'Mediocricity is the key for LLM as a Judge Anchor Selection'. This project enables systematic pairwise evaluation of multiple models on Arena-hard and AlpacaEval datasets, examining the effect of the chosen `anchor', i.e., the model to which all the other evaluated models are compared. - View it on GitHub
Star
0
Rank
13845985