Mathematical Reasoning2025en
American Invitational Mathematics Examination 2025
AIME I + II 2025. 30 problems total. Metric is average number of correct problems out of 30 (or % correct). Frontier models now achieve near-perfect scores.
Samples:30
Metrics:accuracy
Current State of the Art
o4-mini
OpenAI
92.7
accuracy
accuracy Progress Over Time
Showing 2 breakthroughs from Jan 2025 to Mar 2026
Key Milestones
Total Improvement
28.8%
Time Span
1y 3m
Breakthroughs
2
Current SOTA
92.7
Top Models Performance Comparison
Top 5 models ranked by accuracy
Best Score
92.7
Top Model
o4-mini
Models Compared
5
Score Range
20.7
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | o4-miniAPI OpenAI | 92.7 | Mar 2026 | |
| 2 | Gemini 2.5 ProAPI Google | 86.7 | Mar 2026 | |
| 3 | o3API OpenAI | 86.7 | Mar 2026 | |
| 4 | Claude Opus 4.5API Anthropic | 80 | Mar 2026 | |
| 5 | DeepSeek-R1Open Source DeepSeek | 72 | Mar 2026 |