Who leads the SWE-bench Verified benchmark?

Claude Opus 4.5 currently leads SWE-bench Verified with a score of 80.9 on Pct Resolved.

What is the state-of-the-art score on SWE-bench Verified?

The state-of-the-art result on SWE-bench Verified is 80.9 (Pct Resolved), achieved by Claude Opus 4.5 as of 2026.

How many models are tracked on SWE-bench Verified?

Codesota tracks 3 models on SWE-bench Verified.

When was the SWE-bench Verified leaderboard last updated?

The SWE-bench Verified leaderboard on Codesota includes results through 2026.

Codesota · Benchmark · SWE-bench VerifiedHome/Leaderboards/SWE-bench Verified

Unknown

SWE-bench Verified.

Name: SWE-bench Verified Benchmark Results
Creator: Unknown
Published: 2026-01-01
License: https://creativecommons.org/licenses/by/4.0/

Human-validated subset of 500 GitHub issues from real Python repositories. Models must produce a patch that passes hidden tests. Standard benchmark for autonomous coding agents end-to-end (repo navigation, editing, testing).

Paper ↗Leaderboard ↓

§ 01 · Leaderboard

Results by metric.

Only 3 models on this benchmark

Help build the community leaderboard — submit your model results.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Pct Resolved

Pct Resolved is the reported evaluation metric for SWE-bench Verified. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Pct Resolvedverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Claude Opus 4.5 Top score on SWE-bench Verified leaderboard (Anthropic reported). seed — verify	paper	80.9	2026	Source ↗	Looks wrong?
02	Gemini 3 Pro Gemini 3 Pro on SWE-bench Verified via VALS AI bash-agent harness. seed — verify	paper	78.8	2026	Source ↗	Looks wrong?
03	GPT-5 Codex GPT-5 Codex on SWE-bench Verified. seed — verify	paper	74.9	2026	Source ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards