Mathematical Reasoning2024en

American Invitational Mathematics Examination 2024

30 challenging math problems from the 2024 AIME competition. Tests advanced mathematical reasoning.

Metrics:accuracy, pass@1
Paper / Website
Current State of the Art

o3

OpenAI

96.7

accuracy

AIME 2024 — accuracy

8 results · 2 SOTA advances · higher is better

All results
SOTA frontier
20406080100202520262027accuracyo1-previewo3

accuracy Progress Over Time

Showing 3 breakthroughs from Jan 2025 to Mar 2026

78.183.288.393.398.4Jan 2025Aug 2025Mar 2026accuracyDate

Key Milestones

Jan 2025
DeepSeek-R1

Average AIME 2024 I+II (consensus @ 64 samples). Source: DeepSeek-R1 paper, arxiv:2501.12948 (Jan 2025).

79.8
Dec 2025
o1-preview

American Invitational Mathematics Examination. Elite competition math.

83.3
+4.4%
Mar 2026
o3Current SOTA

Average over AIME 2024 I+II. Pass@1 consensus. Source: OpenAI o3 system card (Dec 2024).

96.7
+16.1%
Total Improvement
21.2%
Time Span
1y 3m
Breakthroughs
3
Current SOTA
96.7

Top Models Performance Comparison

Top 8 models ranked by accuracy

accuracy1o396.7100.0%2o4-mini93.496.6%3Gemini 2.5 Pro92.095.1%4o1-preview83.386.1%5Claude 3.7 Sonnet80.082.7%6DeepSeek-R179.882.5%7Claude 3.5 Opus16.016.5%8GPT-4o13.413.9%0%25%50%75%100%% of best
Best Score
96.7
Top Model
o3
Models Compared
8
Score Range
83.3

accuracyPrimary

#ModelScorePaper / CodeDate
1
o3API
OpenAI
96.7Mar 2026
2
o4-miniAPI
OpenAI
93.4Mar 2026
3
Gemini 2.5 ProAPI
Google
92Mar 2026
4
o1-preview
OpenAI
83.3Dec 2025
5
Claude 3.7 SonnetAPI
Anthropic
80Mar 2026
6
DeepSeek-R1Open Source
DeepSeek
79.8Mar 2026
7
Claude 3.5 Opus
Anthropic
16Dec 2025
8
GPT-4oAPI
OpenAI
13.4Dec 2025

Other Mathematical Reasoning Datasets