SWE-Bench

Unknown

2,294 real GitHub issues from popular Python repositories. Tests ability to resolve real-world software engineering tasks.

Benchmark Stats

Models0
Papers0
Metrics0

SOTA History

Not enough data to show trend.

No results yet on this benchmark

Help build the community leaderboard — submit your model results.

No benchmark results available yet for SWE-Bench.

Check back soon as we continue collecting data.

Submit a Result