Standard program repair benchmark with 835 real bugs from 17 open-source Java projects. Each bug has a fix and triggering test suite. Primary metric is the number of correctly fixed bugs (plausible and correct patches).
Correct Patches is the reported evaluation metric for Defects4J. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | SRepair | verified | 101 | 2024 | Paper ↗ | Looks wrong? |
| 02 | Claude Opus 4 | verified | 89 | 2026 | Source ↗ | Looks wrong? |
| 03 | GPT-4o | verified | 82 | 2024 | Paper ↗ | Looks wrong? |
| 04 | ChatRepair | verified | 78 | 2024 | Paper ↗ | Looks wrong? |
| 05 | AlphaRepair | verified | 23 | 2022 | Paper ↗ | Looks wrong? |