Code Generation2024python

SWE-bench Verified Subset

500 manually verified GitHub issues confirmed solvable by human engineers. High-quality subset of SWE-bench.

Metrics:resolve-rate
Paper / Website
Current State of the Art

Claude Opus 4.5

Anthropic

80.9

resolve-rate

SWE-Bench Verified — resolve-rate

38 results · 1 SOTA advances · higher is better

All results
SOTA frontier
405060708020262027resolve-rateClaude Opus 4.5

resolve-rate Progress Over Time

Showing 2 breakthroughs from Jul 2025 to Mar 2026

64.368.873.377.982.4Jul 2025Mar 2026resolve-rateDate

Key Milestones

Jul 2025
Kimi-K2

Open-source. Non-thinking mode. July 2025.

65.8
Mar 2026
Claude Opus 4.5Current SOTA

First model to break 80%. With agent scaffolding.

80.9
+22.9%
Total Improvement
22.9%
Time Span
9m
Breakthroughs
2
Current SOTA
80.9

Top Models Performance Comparison

Top 10 models ranked by resolve-rate

resolve-rate1Claude Opus 4.580.9100.0%2Claude Opus 4.680.899.9%3Gemini 3.1 Pro80.699.6%4MiniMax M2.580.299.1%5GPT-5.2 Thinking80.098.9%6Claude Sonnet 4.679.698.4%7Gemini 3 Flash78.096.4%8Claude Sonnet 4.577.295.4%9Kimi K2.576.894.9%10GPT-5.176.394.3%0%25%50%75%100%% of best
Best Score
80.9
Top Model
Claude Opus 4.5
Models Compared
10
Score Range
4.6

resolve-ratePrimary

#ModelScorePaper / CodeDate
1
Claude Opus 4.5API
Anthropic
80.9Mar 2026
2
Claude Opus 4.6API
Anthropic
80.8Mar 2026
3
Gemini 3.1 ProAPI
Google
80.6Mar 2026
4
MiniMax M2.5API
MiniMax
80.2Mar 2026
5
GPT-5.2 ThinkingAPI
OpenAI
80Mar 2026
6
Claude Sonnet 4.6API
Anthropic
79.6Mar 2026
7
Gemini 3 FlashAPI
Google
78Mar 2026
8
Claude Sonnet 4.5API
Anthropic
77.2Mar 2026
9
Kimi K2.5Open Source
Moonshot AI
76.8Mar 2026
10
GPT-5.1API
OpenAI
76.3Mar 2026
11
Gemini 3 Pro
Google
76.2Mar 2026
12
GPT-5API
OpenAI
74.9Mar 2026
13
MiniMax M2.1API
MiniMax
74Mar 2026
14
Claude Haiku 4.5API
Anthropic
73.3Mar 2026
15
Claude Sonnet 4API
Anthropic
72.7Mar 2026
16
Claude Opus 4API
Anthropic
72.5Mar 2026
17
Devstral 2Open Source
Mistral
72.2Mar 2026
18
Qwen3-Coder 480B A35BOpen Source
Alibaba Cloud
69.6Mar 2026
19
MiniMax M2API
MiniMax
69.4Mar 2026
20
o3API
OpenAI
69.1Mar 2026
21
o4-miniAPI
OpenAI
68.1Mar 2026
22
DeepSeek-V3.1Open Source
DeepSeek
66Mar 2026
23
Kimi-K2Open Source
Moonshot.AI
65.8Mar 2026
24
Grok 3API
xAI
63.8Mar 2026
25
Gemini 2.5 ProAPI
Google
63.8Mar 2026
26
Claude 3.7 SonnetAPI
Anthropic
63.7Mar 2026
27
Gemini 2.5 FlashAPI
Google
60.4Mar 2026
28
DeepSeek-R1-0528Open Source
DeepSeek
57.6Mar 2026
29
o3-miniAPI
OpenAI
55.8Mar 2026
30
GPT-4.1API
OpenAI
54.6Mar 2026
31
Claude 3.5 SonnetAPI
Anthropic
50.8Mar 2026
32
DeepSeek-R1Open Source
DeepSeek
49.2Mar 2026
33
o1API
OpenAI
48.9Mar 2026
34
Devstral Small 2505Open Source
Mistral
46.8Mar 2026
35
DeepSeek-V3Open Source
DeepSeek
42Mar 2026
36
GPT-4oAPI
OpenAI
41.2Mar 2026
37
Claude 3.5 HaikuAPI
Anthropic
40.6Mar 2026
38
DeepSeek-V2.5Open Source
DeepSeek
37Mar 2026

Other Code Generation Datasets