Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
divelab
Fetched on 2025/03/15 15:56
divelab
/
Sys2Bench
Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks. -
View it on GitHub
https://arxiv.org/abs/2502.12521
Star
18
Rank
941899