Codesota · Benchmark · SWE-bench VerifiedHome/Leaderboards/Code & Software Engineering/SWE-bench Issue Resolution/SWE-bench Verified
Unknown

SWE-bench Verified.

500 manually verified GitHub issues confirmed solvable by human engineers. The primary benchmark for software engineering agents. Results tracked from autonomous scaffolds (not just model capability).

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Resolve Rate

Resolve Rate is the reported evaluation metric for SWE-bench Verified. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Resolve Rateverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Claude Mythos Previewverified93.92026Source ↗Looks wrong?
02Claude Opus 4.5verified80.92026Source ↗Looks wrong?
03Claude Opus 4.6verified80.82026Source ↗Looks wrong?
04Gemini 3.1 Proverified80.62026Source ↗Looks wrong?
05MiniMax M2.5verified80.22026Source ↗Looks wrong?
06GPT-5.2verified802026Source ↗Looks wrong?
07Claude Sonnet 4.6verified79.62026Source ↗Looks wrong?
08Qwen3.6 Plusverified78.82026Source ↗Looks wrong?
09MiMo-V2-Proverified782026Source ↗Looks wrong?
10Gemini 3 Flashverified782026Source ↗Looks wrong?
11GLM-5verified77.82026Source ↗Looks wrong?
12Muse Sparkverified77.42026Source ↗Looks wrong?
13Kimi K2.5verified76.82026Source ↗Looks wrong?
14Seed 2.0 Proverified76.52026Source ↗Looks wrong?
15Qwen3.5-397B-A17Bverified76.42026Source ↗Looks wrong?
16GPT-5.1 Instantverified76.32026Source ↗Looks wrong?
17GPT-5.1 Thinkingverified76.32026Source ↗Looks wrong?
18GPT-5.1verified76.32026Source ↗Looks wrong?
19Gemini 3 Proverified76.22026Source ↗Looks wrong?
20GPT-5verified74.92026Source ↗Looks wrong?
21MiMo-V2-Omniverified74.82026Source ↗Looks wrong?
22GPT-5 Codexverified74.52026Source ↗Looks wrong?
23Claude Opus 4.1verified74.52026Source ↗Looks wrong?
24Step-3.5-Flashverified74.42026Source ↗Looks wrong?
25GLM-4.7verified73.82026Source ↗Looks wrong?
26GPT-5.1 Codexverified73.72026Source ↗Looks wrong?
27Seed 2.0 Liteverified73.52026Source ↗Looks wrong?
28MiMo-V2-Flashverified73.42026Source ↗Looks wrong?
29Claude Haiku 4.5verified73.32026Source ↗Looks wrong?
30DeepSeek-V3.2-Specialeverified73.12026Source ↗Looks wrong?
31DeepSeek-V3.2 (Thinking)verified73.12026Source ↗Looks wrong?
32Claude Sonnet 4verified72.72026Source ↗Looks wrong?
33Claude Opus 4verified72.52026Source ↗Looks wrong?
34Qwen3.5-27Bverified72.42026Source ↗Looks wrong?
35Qwen3.5-122B-A10Bverified722026Source ↗Looks wrong?
36Kimi K2-Thinking-0905verified71.32026Source ↗Looks wrong?
37Grok Code Fast 1verified70.82026Source ↗Looks wrong?
38Claude 3.7 Sonnetverified70.32026Source ↗Looks wrong?
39LongCat-Flash-Thinking-2601verified702026Source ↗Looks wrong?
40Qwen3-Coder 480B A35Bverified69.62026Source ↗Looks wrong?
41Qwen3 Maxverified69.62026Source ↗Looks wrong?
42MiniMax M2verified69.42026Source ↗Looks wrong?
43Qwen3.5-35B-A3Bverified69.22026Source ↗Looks wrong?
44o3verified69.12026Source ↗Looks wrong?
45o4-miniverified68.12026Source ↗Looks wrong?
46GLM-4.6verified682026Source ↗Looks wrong?
47DeepSeek-V3.2-Expverified67.82026Source ↗Looks wrong?
48Gemini 2.5 Pro Previewverified67.22026Source ↗Looks wrong?
49MiniMax M2.1verified672026Source ↗Looks wrong?
50DeepSeek-V3.1verified662026Source ↗Looks wrong?
51Kimi K2-Instruct-0905verified65.82026Source ↗Looks wrong?
52GLM-4.5verified64.22026Source ↗Looks wrong?
53Gemini 2.5 Proverified63.22026Source ↗Looks wrong?
54Devstral Mediumverified61.62026Source ↗Looks wrong?
55LongCat-Flash-Chatverified60.42026Source ↗Looks wrong?
56Gemini 2.5 Flashverified60.42026Source ↗Looks wrong?
57LongCat-Flash-Thinkingverified59.42026Source ↗Looks wrong?
58GLM-4.7-Flashverified59.22026Source ↗Looks wrong?
59GLM-4.5-Airverified57.62026Source ↗Looks wrong?
60MiniMax M1 80Kverified562026Source ↗Looks wrong?
61MiniMax M1 40Kverified55.62026Source ↗Looks wrong?
62GPT-4.1verified54.62026Source ↗Looks wrong?
63LongCat-Flash-Liteverified54.42026Source ↗Looks wrong?
64NVIDIA-Nemotron-3-Super-120B-A12B-BF16verified53.72026Source ↗Looks wrong?
65Devstral Small 1.1verified53.62026Source ↗Looks wrong?
66o3-miniverified49.32026Source ↗Looks wrong?
67Claude 3.5 Sonnetverified492026Source ↗Looks wrong?
68Sarvam-105Bverified452026Source ↗Looks wrong?
69DeepSeek-R1-0528verified44.62026Source ↗Looks wrong?
70DeepSeek-V3verified422026Source ↗Looks wrong?
71o1-previewverified41.32026Source ↗Looks wrong?
72o1verified412026Source ↗Looks wrong?
73Claude 3.5 Haikuverified40.62026Source ↗Looks wrong?
74Nemotron 3 Nano (30B)verified38.82026Source ↗Looks wrong?
75GPT-4.5verified382026Source ↗Looks wrong?
76Sarvam-30Bverified342026Source ↗Looks wrong?
77GPT-4overified33.22026Source ↗Looks wrong?
78Gemini 2.5 Flash-Liteverified31.62026Source ↗Looks wrong?
79GPT-4.1 miniverified23.62026Source ↗Looks wrong?
80Gemini Diffusionverified22.92026Source ↗Looks wrong?
81DeepSeek-V2.5verified16.82026Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to SWE-bench Issue Resolution