Mathematical Reasoning2024en
American Invitational Mathematics Examination 2024
30 challenging math problems from the 2024 AIME competition. Tests advanced mathematical reasoning.
Metrics:accuracy, pass@1
Paper / WebsiteCurrent State of the Art
o3
OpenAI
96.7
accuracy
AIME 2024 — accuracy
8 results · 2 SOTA advances · higher is better
All results
SOTA frontier
accuracy Progress Over Time
Showing 3 breakthroughs from Jan 2025 to Mar 2026
Key Milestones
Jan 2025
DeepSeek-R1
Average AIME 2024 I+II (consensus @ 64 samples). Source: DeepSeek-R1 paper, arxiv:2501.12948 (Jan 2025).
79.8
Dec 2025
o1-preview
American Invitational Mathematics Examination. Elite competition math.
83.3
+4.4%
Mar 2026
o3Current SOTA
Average over AIME 2024 I+II. Pass@1 consensus. Source: OpenAI o3 system card (Dec 2024).
96.7
+16.1%
Total Improvement
21.2%
Time Span
1y 3m
Breakthroughs
3
Current SOTA
96.7
Top Models Performance Comparison
Top 8 models ranked by accuracy
Best Score
96.7
Top Model
o3
Models Compared
8
Score Range
83.3
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | o3API OpenAI | 96.7 | Mar 2026 | |
| 2 | o4-miniAPI OpenAI | 93.4 | Mar 2026 | |
| 3 | Gemini 2.5 ProAPI Google | 92 | Mar 2026 | |
| 4 | o1-preview OpenAI | 83.3 | Dec 2025 | |
| 5 | Claude 3.7 SonnetAPI Anthropic | 80 | Mar 2026 | |
| 6 | DeepSeek-R1Open Source DeepSeek | 79.8 | Mar 2026 | |
| 7 | Claude 3.5 Opus Anthropic | 16 | Dec 2025 | |
| 8 | GPT-4oAPI OpenAI | 13.4 | Dec 2025 |