Mathematical Reasoning2021en
Grade School Math 8K
8,500 grade school math word problems requiring multi-step reasoning. The most popular math reasoning benchmark.
Current State of the Art
o1-preview
OpenAI
97.8
accuracy
Top Models Performance Comparison
Top 5 models ranked by accuracy
Best Score
97.8
Top Model
o1-preview
Models Compared
5
Score Range
6.1
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | o1-preview OpenAI | 97.8 | Dec 2025 | |
| 2 | Claude 3.5 SonnetAPI Anthropic | 96.4 | Dec 2025 | |
| 3 | Llama 3 70BOpen Source Meta | 93 | Dec 2025 | |
| 4 | GPT-4oAPI OpenAI | 92 | Dec 2025 | |
| 5 | Gemini 1.5 ProAPI Google | 91.7 | Dec 2025 |