SWE-Bench
Unknown
2,294 real GitHub issues from popular Python repositories. Tests ability to resolve real-world software engineering tasks.
Benchmark Stats
Models0
Papers0
Metrics0
SOTA History
Not enough data to show trend.
No results yet on this benchmark
Help build the community leaderboard — submit your model results.
No benchmark results available yet for SWE-Bench.
Check back soon as we continue collecting data.