Who leads the SWE-bench Verified benchmark?

Claude Mythos Preview currently leads SWE-bench Verified with a score of 93.9 on Resolve Rate.

What is the state-of-the-art score on SWE-bench Verified?

The state-of-the-art result on SWE-bench Verified is 93.9 (Resolve Rate), achieved by Claude Mythos Preview as of 2026.

How many models are tracked on SWE-bench Verified?

Codesota tracks 81 models on SWE-bench Verified.

When was the SWE-bench Verified leaderboard last updated?

The SWE-bench Verified leaderboard on Codesota includes results through 2026.

Codesota · Benchmark · SWE-bench VerifiedHome/Leaderboards/Code & Software Engineering/SWE-bench Issue Resolution/SWE-bench Verified

Unknown

SWE-bench Verified.

Name: SWE-bench Verified Benchmark Results
Creator: Unknown
Published: 2026-01-01
License: https://creativecommons.org/licenses/by/4.0/

500 manually verified GitHub issues confirmed solvable by human engineers. The primary benchmark for software engineering agents. Results tracked from autonomous scaffolds (not just model capability).

Paper ↗Leaderboard ↓

§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Resolve Rate

Resolve Rate is the reported evaluation metric for SWE-bench Verified. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Resolve Rateverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Claude Mythos Preview	verified	93.9	2026	Source ↗	Looks wrong?
02	Claude Opus 4.5	verified	80.9	2026	Source ↗	Looks wrong?
03	Claude Opus 4.6	verified	80.8	2026	Source ↗	Looks wrong?
04	Gemini 3.1 Pro	verified	80.6	2026	Source ↗	Looks wrong?
05	MiniMax M2.5	verified	80.2	2026	Source ↗	Looks wrong?
06	GPT-5.2	verified	80	2026	Source ↗	Looks wrong?
07	Claude Sonnet 4.6	verified	79.6	2026	Source ↗	Looks wrong?
08	Qwen3.6 Plus	verified	78.8	2026	Source ↗	Looks wrong?
09	MiMo-V2-Pro	verified	78	2026	Source ↗	Looks wrong?
10	Gemini 3 Flash	verified	78	2026	Source ↗	Looks wrong?
11	GLM-5	verified	77.8	2026	Source ↗	Looks wrong?
12	Muse Spark	verified	77.4	2026	Source ↗	Looks wrong?
13	Kimi K2.5	verified	76.8	2026	Source ↗	Looks wrong?
14	Seed 2.0 Pro	verified	76.5	2026	Source ↗	Looks wrong?
15	Qwen3.5-397B-A17B	verified	76.4	2026	Source ↗	Looks wrong?
16	GPT-5.1 Instant	verified	76.3	2026	Source ↗	Looks wrong?
17	GPT-5.1 Thinking	verified	76.3	2026	Source ↗	Looks wrong?
18	GPT-5.1	verified	76.3	2026	Source ↗	Looks wrong?
19	Gemini 3 Pro	verified	76.2	2026	Source ↗	Looks wrong?
20	GPT-5	verified	74.9	2026	Source ↗	Looks wrong?
21	MiMo-V2-Omni	verified	74.8	2026	Source ↗	Looks wrong?
22	GPT-5 Codex	verified	74.5	2026	Source ↗	Looks wrong?
23	Claude Opus 4.1	verified	74.5	2026	Source ↗	Looks wrong?
24	Step-3.5-Flash	verified	74.4	2026	Source ↗	Looks wrong?
25	GLM-4.7	verified	73.8	2026	Source ↗	Looks wrong?
26	GPT-5.1 Codex	verified	73.7	2026	Source ↗	Looks wrong?
27	Seed 2.0 Lite	verified	73.5	2026	Source ↗	Looks wrong?
28	MiMo-V2-Flash	verified	73.4	2026	Source ↗	Looks wrong?
29	Claude Haiku 4.5	verified	73.3	2026	Source ↗	Looks wrong?
30	DeepSeek-V3.2-Speciale	verified	73.1	2026	Source ↗	Looks wrong?
31	DeepSeek-V3.2 (Thinking)	verified	73.1	2026	Source ↗	Looks wrong?
32	Claude Sonnet 4	verified	72.7	2026	Source ↗	Looks wrong?
33	Claude Opus 4	verified	72.5	2026	Source ↗	Looks wrong?
34	Qwen3.5-27B	verified	72.4	2026	Source ↗	Looks wrong?
35	Qwen3.5-122B-A10B	verified	72	2026	Source ↗	Looks wrong?
36	Kimi K2-Thinking-0905	verified	71.3	2026	Source ↗	Looks wrong?
37	Grok Code Fast 1	verified	70.8	2026	Source ↗	Looks wrong?
38	Claude 3.7 Sonnet	verified	70.3	2026	Source ↗	Looks wrong?
39	LongCat-Flash-Thinking-2601	verified	70	2026	Source ↗	Looks wrong?
40	Qwen3-Coder 480B A35B	verified	69.6	2026	Source ↗	Looks wrong?
41	Qwen3 Max	verified	69.6	2026	Source ↗	Looks wrong?
42	MiniMax M2	verified	69.4	2026	Source ↗	Looks wrong?
43	Qwen3.5-35B-A3B	verified	69.2	2026	Source ↗	Looks wrong?
44	o3	verified	69.1	2026	Source ↗	Looks wrong?
45	o4-mini	verified	68.1	2026	Source ↗	Looks wrong?
46	GLM-4.6	verified	68	2026	Source ↗	Looks wrong?
47	DeepSeek-V3.2-Exp	verified	67.8	2026	Source ↗	Looks wrong?
48	Gemini 2.5 Pro Preview	verified	67.2	2026	Source ↗	Looks wrong?
49	MiniMax M2.1	verified	67	2026	Source ↗	Looks wrong?
50	DeepSeek-V3.1	verified	66	2026	Source ↗	Looks wrong?
51	Kimi K2-Instruct-0905	verified	65.8	2026	Source ↗	Looks wrong?
52	GLM-4.5	verified	64.2	2026	Source ↗	Looks wrong?
53	Gemini 2.5 Pro	verified	63.2	2026	Source ↗	Looks wrong?
54	Devstral Medium	verified	61.6	2026	Source ↗	Looks wrong?
55	LongCat-Flash-Chat	verified	60.4	2026	Source ↗	Looks wrong?
56	Gemini 2.5 Flash	verified	60.4	2026	Source ↗	Looks wrong?
57	LongCat-Flash-Thinking	verified	59.4	2026	Source ↗	Looks wrong?
58	GLM-4.7-Flash	verified	59.2	2026	Source ↗	Looks wrong?
59	GLM-4.5-Air	verified	57.6	2026	Source ↗	Looks wrong?
60	MiniMax M1 80K	verified	56	2026	Source ↗	Looks wrong?
61	MiniMax M1 40K	verified	55.6	2026	Source ↗	Looks wrong?
62	GPT-4.1	verified	54.6	2026	Source ↗	Looks wrong?
63	LongCat-Flash-Lite	verified	54.4	2026	Source ↗	Looks wrong?
64	NVIDIA-Nemotron-3-Super-120B-A12B-BF16	verified	53.7	2026	Source ↗	Looks wrong?
65	Devstral Small 1.1	verified	53.6	2026	Source ↗	Looks wrong?
66	o3-mini	verified	49.3	2026	Source ↗	Looks wrong?
67	Claude 3.5 Sonnet	verified	49	2026	Source ↗	Looks wrong?
68	Sarvam-105B	verified	45	2026	Source ↗	Looks wrong?
69	DeepSeek-R1-0528	verified	44.6	2026	Source ↗	Looks wrong?
70	DeepSeek-V3	verified	42	2026	Source ↗	Looks wrong?
71	o1-preview	verified	41.3	2026	Source ↗	Looks wrong?
72	o1	verified	41	2026	Source ↗	Looks wrong?
73	Claude 3.5 Haiku	verified	40.6	2026	Source ↗	Looks wrong?
74	Nemotron 3 Nano (30B)	verified	38.8	2026	Source ↗	Looks wrong?
75	GPT-4.5	verified	38	2026	Source ↗	Looks wrong?
76	Sarvam-30B	verified	34	2026	Source ↗	Looks wrong?
77	GPT-4o	verified	33.2	2026	Source ↗	Looks wrong?
78	Gemini 2.5 Flash-Lite	verified	31.6	2026	Source ↗	Looks wrong?
79	GPT-4.1 mini	verified	23.6	2026	Source ↗	Looks wrong?
80	Gemini Diffusion	verified	22.9	2026	Source ↗	Looks wrong?
81	DeepSeek-V2.5	verified	16.8	2026	Source ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to SWE-bench Issue Resolution