Codesota · Benchmark · ARC-AGI-2Home/Leaderboards/Language & Knowledge/Logical Reasoning/ARC-AGI-2
Unknown

ARC-AGI-2.

Harder successor to ARC-AGI-1, released 2025. Designed to be more resistant to test-time compute scaling. Scores reported as % on public evaluation set.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for ARC-AGI-2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Gemini 2.5 Pro
Public evaluation set. Source: ARC Prize leaderboard (2025).
verified5.002026Source ↗Looks wrong?
02o3
Public evaluation set. Released May 2025. Much harder than v1 — o3 only scores ~4%. Source: ARC Prize blog (2025).
verified4.002026Source ↗Looks wrong?
03o4-mini
Public evaluation set. Source: ARC Prize leaderboard (2025).
verified3.002026Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Logical Reasoning