Codesota · Benchmark · AIME 2025Home/Leaderboards/Language & Knowledge/Mathematical Reasoning/AIME 2025
Unknown

AIME 2025.

Olympiad-style short-answer math benchmark used by reasoning-model releases. Small test set, so score swings should be read with caution.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for AIME 2025. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Step-3.5-Flash PaCoReunverified99.92026Paper ↗Code ↗Looks wrong?
02Step-3.5-Flashunverified97.32026Paper ↗Code ↗Looks wrong?
03Kimi-K2.5unverified96.12026Paper ↗Code ↗Looks wrong?
04DeepSeek-V3.2-Specialeunverified962025Paper ↗Source ↗Looks wrong?
05SU-01unverified94.62026Paper ↗Code ↗Looks wrong?
06Intern-S1-Prounverified93.12026Paper ↗Source ↗Looks wrong?
07DeepSeek-V3.2unverified93.12025Paper ↗Source ↗Looks wrong?
08o4-mini
Average over AIME 2025 I+II. Source: OpenAI o4-mini system card (April 2025).
verified92.72026Source ↗Looks wrong?
09Qwen3-VL-235B-A22B-Thinkingunverified89.72025Paper ↗Code ↗Looks wrong?
10NVIDIA-Nemotron-3-Nano-30B-A3B-BF16unverified89.12025Paper ↗Code ↗Source ↗Looks wrong?
11Gemini 2.5 Prounverified882025Paper ↗Looks wrong?
12o3
Average over AIME 2025 I+II (30 problems). Source: OpenAI (2025).
verified86.72026Source ↗Looks wrong?
13Qwen3-Coder-Nextunverified83.072026Paper ↗Code ↗Looks wrong?
14Qwen3-235B-A22Bunverified81.52025Paper ↗Code ↗Looks wrong?
15Claude Opus 4.5
Average over AIME 2025 I+II. Source: Claude Opus 4.5 model card, Anthropic (2025).
verified802026Source ↗Looks wrong?
16Qwen3-VL-235B-A22B-Instructunverified74.72025Paper ↗Code ↗Looks wrong?
17Qwen3-Omni-Flash-Thinkingunverified742025Paper ↗Code ↗Looks wrong?
18Gemini 2.5 Flashunverified722025Paper ↗Looks wrong?
19DeepSeek R1
Average AIME 2025 I+II (estimated from leaderboard). Source: DeepSeek-R1 technical report.
verified722026Source ↗Looks wrong?
20Qwen3-VL-8B-Instructunverified45.92025Paper ↗Code ↗Looks wrong?
21Trinity Large Previewunverified24.362026Paper ↗Code ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Mathematical Reasoning