Code and dataset for the paper: MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset (https://arxiv.org/pdf/2406.02106). - View it on GitHub
Star
3
Rank
2950555