Mathematical Reasoning2025en

American Invitational Mathematics Examination 2025

AIME I + II 2025. 30 problems total. Metric is average number of correct problems out of 30 (or % correct). Frontier models now achieve near-perfect scores.

Samples:30
Metrics:accuracy
Current State of the Art

o4-mini

OpenAI

92.7

accuracy

accuracy Progress Over Time

Showing 2 breakthroughs from Jan 2025 to Mar 2026

69.976.182.488.694.8Jan 2025Mar 2026accuracyDate

Key Milestones

Jan 2025
DeepSeek-R1

Average AIME 2025 I+II (estimated from leaderboard). Source: DeepSeek-R1 technical report.

72.0
Mar 2026
o4-miniCurrent SOTA

Average over AIME 2025 I+II. Source: OpenAI o4-mini system card (April 2025).

92.7
+28.8%
Total Improvement
28.8%
Time Span
1y 3m
Breakthroughs
2
Current SOTA
92.7

Top Models Performance Comparison

Top 5 models ranked by accuracy

accuracy1o4-mini92.7100.0%2Gemini 2.5 Pro86.793.5%3o386.793.5%4Claude Opus 4.580.086.3%5DeepSeek-R172.077.7%0%25%50%75%100%% of best
Best Score
92.7
Top Model
o4-mini
Models Compared
5
Score Range
20.7

accuracyPrimary

#ModelScorePaper / CodeDate
1
o4-miniAPI
OpenAI
92.7Mar 2026
2
Gemini 2.5 ProAPI
Google
86.7Mar 2026
3
o3API
OpenAI
86.7Mar 2026
4
Claude Opus 4.5API
Anthropic
80Mar 2026
5
DeepSeek-R1Open Source
DeepSeek
72Mar 2026

Other Mathematical Reasoning Datasets