In this project, we develop an enterprise benchmark framework for large language model (LLM) evaluation. We extend HELM, an open-source benchmark framework developed by Stanford CRFM, to enable users evaluate LLMs with domain-specific datasets such as finance, legal, climate, and cybersecurity. - View it on GitHub
Star
1
Rank
5279409