Mathematical Reasoning2021en
Mathematics Aptitude Test of Heuristics
12,500 competition mathematics problems from AMC, AIME, and other sources. Harder than GSM8K.
Current State of the Art
o1-preview
OpenAI
94.8
accuracy
Top Models Performance Comparison
Top 5 models ranked by accuracy
Best Score
94.8
Top Model
o1-preview
Models Compared
5
Score Range
27.1
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | o1-preview OpenAI | 94.8 | Dec 2025 | |
| 2 | DeepSeek V3Open Source DeepSeek | 90.2 | Dec 2025 | |
| 3 | GPT-4oAPI OpenAI | 76.6 | Dec 2025 | |
| 4 | Claude 3.5 SonnetAPI Anthropic | 71.1 | Dec 2025 | |
| 5 | Gemini 1.5 ProAPI Google | 67.7 | Dec 2025 |