Reasoning

Testing if your model can think logically? Benchmark math problem solving, commonsense understanding, and multi-step reasoning capabilities.

5 tasks15 datasets

Tasks in Reasoning

Explore Other Areas