500 manually verified GitHub issues confirmed solvable by human engineers. The primary benchmark for software engineering agents. Results tracked from autonomous scaffolds (not just model capability).
Resolve Rate is the reported evaluation metric for SWE-bench Verified. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better