Code Generation2024python
SWE-bench Verified Subset
500 manually verified GitHub issues confirmed solvable by human engineers. High-quality subset of SWE-bench.
Metrics:resolve-rate
Paper / WebsiteCurrent State of the Art
Claude Opus 4.5
Anthropic
80.9
resolve-rate
SWE-Bench Verified — resolve-rate
38 results · 1 SOTA advances · higher is better
All results
SOTA frontier
resolve-rate Progress Over Time
Showing 2 breakthroughs from Jul 2025 to Mar 2026
Key Milestones
Total Improvement
22.9%
Time Span
9m
Breakthroughs
2
Current SOTA
80.9
Top Models Performance Comparison
Top 10 models ranked by resolve-rate
Best Score
80.9
Top Model
Claude Opus 4.5
Models Compared
10
Score Range
4.6
resolve-ratePrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Claude Opus 4.5API Anthropic | 80.9 | Mar 2026 | |
| 2 | Claude Opus 4.6API Anthropic | 80.8 | Mar 2026 | |
| 3 | Gemini 3.1 ProAPI Google | 80.6 | Mar 2026 | |
| 4 | MiniMax M2.5API MiniMax | 80.2 | Mar 2026 | |
| 5 | GPT-5.2 ThinkingAPI OpenAI | 80 | Mar 2026 | |
| 6 | Claude Sonnet 4.6API Anthropic | 79.6 | Mar 2026 | |
| 7 | Gemini 3 FlashAPI Google | 78 | Mar 2026 | |
| 8 | Claude Sonnet 4.5API Anthropic | 77.2 | Mar 2026 | |
| 9 | Kimi K2.5Open Source Moonshot AI | 76.8 | Mar 2026 | |
| 10 | GPT-5.1API OpenAI | 76.3 | Mar 2026 | |
| 11 | Gemini 3 Pro Google | 76.2 | Mar 2026 | |
| 12 | GPT-5API OpenAI | 74.9 | Mar 2026 | |
| 13 | MiniMax M2.1API MiniMax | 74 | Mar 2026 | |
| 14 | Claude Haiku 4.5API Anthropic | 73.3 | Mar 2026 | |
| 15 | Claude Sonnet 4API Anthropic | 72.7 | Mar 2026 | |
| 16 | Claude Opus 4API Anthropic | 72.5 | Mar 2026 | |
| 17 | Devstral 2Open Source Mistral | 72.2 | Mar 2026 | |
| 18 | Qwen3-Coder 480B A35BOpen Source Alibaba Cloud | 69.6 | Mar 2026 | |
| 19 | MiniMax M2API MiniMax | 69.4 | Mar 2026 | |
| 20 | o3API OpenAI | 69.1 | Mar 2026 | |
| 21 | o4-miniAPI OpenAI | 68.1 | Mar 2026 | |
| 22 | DeepSeek-V3.1Open Source DeepSeek | 66 | Mar 2026 | |
| 23 | Kimi-K2Open Source Moonshot.AI | 65.8 | Mar 2026 | |
| 24 | Grok 3API xAI | 63.8 | Mar 2026 | |
| 25 | Gemini 2.5 ProAPI Google | 63.8 | Mar 2026 | |
| 26 | Claude 3.7 SonnetAPI Anthropic | 63.7 | Mar 2026 | |
| 27 | Gemini 2.5 FlashAPI Google | 60.4 | Mar 2026 | |
| 28 | DeepSeek-R1-0528Open Source DeepSeek | 57.6 | Mar 2026 | |
| 29 | o3-miniAPI OpenAI | 55.8 | Mar 2026 | |
| 30 | GPT-4.1API OpenAI | 54.6 | Mar 2026 | |
| 31 | Claude 3.5 SonnetAPI Anthropic | 50.8 | Mar 2026 | |
| 32 | DeepSeek-R1Open Source DeepSeek | 49.2 | Mar 2026 | |
| 33 | o1API OpenAI | 48.9 | Mar 2026 | |
| 34 | Devstral Small 2505Open Source Mistral | 46.8 | Mar 2026 | |
| 35 | DeepSeek-V3Open Source DeepSeek | 42 | Mar 2026 | |
| 36 | GPT-4oAPI OpenAI | 41.2 | Mar 2026 | |
| 37 | Claude 3.5 HaikuAPI Anthropic | 40.6 | Mar 2026 | |
| 38 | DeepSeek-V2.5Open Source DeepSeek | 37 | Mar 2026 |