Who leads the AIME 2025 benchmark?

Step-3.5-Flash PaCoRe currently leads AIME 2025 with a score of 99.9 on Accuracy.

What is the state-of-the-art score on AIME 2025?

The state-of-the-art result on AIME 2025 is 99.9 (Accuracy), achieved by Step-3.5-Flash PaCoRe as of 2026.

How many models are tracked on AIME 2025?

Codesota tracks 21 models on AIME 2025.

When was the AIME 2025 leaderboard last updated?

The AIME 2025 leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2025.

Codesota · Benchmark · AIME 2025Home/Leaderboards/Language & Knowledge/Mathematical Reasoning/AIME 2025

Unknown

AIME 2025.

Name: AIME 2025 Benchmark Results
Creator: Unknown
Published: 2025-01-01
License: https://creativecommons.org/licenses/by/4.0/

Olympiad-style short-answer math benchmark used by reasoning-model releases. Small test set, so score swings should be read with caution.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Accuracy

Accuracy is the reported evaluation metric for AIME 2025. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Step-3.5-Flash PaCoRe	unverified	99.9	2026	Paper ↗Code ↗	Looks wrong?
02	Step-3.5-Flash	unverified	97.3	2026	Paper ↗Code ↗	Looks wrong?
03	Kimi-K2.5	unverified	96.1	2026	Paper ↗Code ↗	Looks wrong?
04	DeepSeek-V3.2-Speciale	unverified	96	2025	Paper ↗Source ↗	Looks wrong?
05	SU-01	unverified	94.6	2026	Paper ↗Code ↗	Looks wrong?
06	Intern-S1-Pro	unverified	93.1	2026	Paper ↗Source ↗	Looks wrong?
07	DeepSeek-V3.2	unverified	93.1	2025	Paper ↗Source ↗	Looks wrong?
08	o4-mini Average over AIME 2025 I+II. Source: OpenAI o4-mini system card (April 2025).	verified	92.7	2026	Source ↗	Looks wrong?
09	Qwen3-VL-235B-A22B-Thinking	unverified	89.7	2025	Paper ↗Code ↗	Looks wrong?
10	NVIDIA-Nemotron-3-Nano-30B-A3B-BF16	unverified	89.1	2025	Paper ↗Code ↗Source ↗	Looks wrong?
11	Gemini 2.5 Pro	unverified	88	2025	Paper ↗	Looks wrong?
12	o3 Average over AIME 2025 I+II (30 problems). Source: OpenAI (2025).	verified	86.7	2026	Source ↗	Looks wrong?
13	Qwen3-Coder-Next	unverified	83.07	2026	Paper ↗Code ↗	Looks wrong?
14	Qwen3-235B-A22B	unverified	81.5	2025	Paper ↗Code ↗	Looks wrong?
15	Claude Opus 4.5 Average over AIME 2025 I+II. Source: Claude Opus 4.5 model card, Anthropic (2025).	verified	80	2026	Source ↗	Looks wrong?
16	Qwen3-VL-235B-A22B-Instruct	unverified	74.7	2025	Paper ↗Code ↗	Looks wrong?
17	Qwen3-Omni-Flash-Thinking	unverified	74	2025	Paper ↗Code ↗	Looks wrong?
18	Gemini 2.5 Flash	unverified	72	2025	Paper ↗	Looks wrong?
19	DeepSeek R1 Average AIME 2025 I+II (estimated from leaderboard). Source: DeepSeek-R1 technical report.	verified	72	2026	Source ↗	Looks wrong?
20	Qwen3-VL-8B-Instruct	unverified	45.9	2025	Paper ↗Code ↗	Looks wrong?
21	Trinity Large Preview	unverified	24.36	2026	Paper ↗Code ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Mathematical Reasoning