Code Generation2023python

SWE-bench: Software Engineering Benchmark

2,294 real GitHub issues from popular Python repositories. Tests ability to resolve real-world software engineering tasks.

Metrics:resolve-rate
Paper / WebsiteDownload

No benchmark results indexed for this dataset yet.

Contribute results on GitHub

Other Code Generation Datasets

SWE-Bench Benchmark - Code Generation | CodeSOTA