Multi-step Reasoning2021en
StrategyQA
2,780 yes/no questions requiring implicit multi-step reasoning to answer.
Current State of the Art
GPT-4o
OpenAI
82.1
accuracy
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | GPT-4oAPI OpenAI | 82.1 | Dec 2025 | |
| 2 | Claude 3.5 SonnetAPI Anthropic | 79.8 | Dec 2025 |