Code Generation2023python
SWE-bench: Software Engineering Benchmark
2,294 real GitHub issues from popular Python repositories. Tests ability to resolve real-world software engineering tasks.
No benchmark results indexed for this dataset yet.
Contribute results on GitHub