A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. - View it on GitHub
Star
90
Rank
269922