Best agent for SWE-Bench — April 2026
No single winner. The right answer depends on whether you optimize for quality, cost, or open-source. Here is the current leaderboard, the Pareto frontier, and a flowchart to pick yours.
#1 Quality
Claude Code + Opus 4.7
~87.6% SWE-Bench Verified via Anthropic's internal harness. Highest score anywhere.
Cost: ~$6 per full eval run.
Best $ / %
Claude Code + Sonnet 4.5
77.2% at ~$1.3. Approx 4x cheaper than Opus with only 4 pts less.
Most teams default here.
#1 Open-Weight
MiniMax M2.5 (self-host)
80.2% via mini-SWE-agent. First open model over 80% with no vendor lock-in.
Runs on 4xH100.
Top 15 leaderboard — all agent/model combos
Best reported pass@1
SWE-Bench Verified, April 2026
Pareto frontier — quality at the cheapest price
Points on the pink line are not strictly dominated — every point inside is worse on at least one axis. Pick one on the frontier and you are making an intentional trade.
The money visual
SWE-Bench Verified cost/perf, April 2026
X: $ per resolved issue (log scale). Y: Verified %. Pink line = Pareto frontier.
Decision tree
Pick the question that matters most. Leaves are recommendations for April 2026.
Decision guide
Which agent fits your workflow?
Quick tradeoffs
| If you... | Pick this | Why |
|---|---|---|
| Care only about the score | Claude Code + Opus 4.7 | Highest reported Verified % as of April 2026 |
| Need best $/% | Claude Code + Sonnet 4.5 | 77% at $1.30 is the sweet spot |
| Want all-open-source | MiniMax M2.5 or OpenHands + DeepSeek V3.2 | 80% open, self-hostable |
| Want hands-off overnight tickets | Devin | Only one that runs unsupervised for hours |
| Live in an IDE | Cursor Composer 2 | Best-in-class IDE integration |
| Are writing a paper | mini-SWE-agent + your model | 100 LOC, fully reproducible baseline |
| Run many cheap tasks | Codex CLI + GPT-5.2 Mini | Cheapest Codex tier that still solves real bugs |
| Want MCP tooling | Claude Code | Largest MCP ecosystem |