swe-bench-verified
Unknown
OCR benchmark
13
Total Results
13
Models Tested
1
Metrics
2026-03-06
Last Updated
resolve-rate
Higher is better
| Rank | Model | Score | Source |
|---|---|---|---|
| 1 | claude-sonnet-45 | 72.7 | anthropic-blog |
| 2 | claude-sonnet-4 | 72.2 | anthropic-blog |
| 3 | o3 | 71.7 | swebench-leaderboard |
| 4 | gemini-25-pro | 63.8 | google-blog |
| 5 | claude-37-sonnet | 62.3 | anthropic-blog |
| 6 | o3-mini | 55.8 | swebench-leaderboard |
| 7 | gpt-41 | 54.6 | swebench-leaderboard |
| 8 | deepseek-r1 | 49.2 | swebench-leaderboard |
| 9 | claude-35-sonnet | 49 | anthropic-blog |
| 10 | o1 | 48.9 | swebench-leaderboard |
| 11 | deepseek-v3 | 42 | swebench-leaderboard |
| 12 | gpt-4o | 41.2 | swebench-leaderboard |
| 13 | deepseek-v25 | 37 | deepseek-blog |