strategyqa
Unknown
OCR benchmark
2
Total Results
2
Models Tested
1
Metrics
2025-12-19
Last Updated
accuracy
Higher is better
| Rank | Model | Score | Source |
|---|---|---|---|
| 1 | gpt-4o Strategy questions requiring implicit multi-step reasoning. | 82.1 | arxiv-paper |
| 2 | claude-35-sonnet | 79.8 | arxiv-paper |