Program Repair2014en
Defects4J: A Database of Real Faults in Java Programs
Standard program repair benchmark with 835 real bugs from 17 open-source Java projects. Each bug has a fix and triggering test suite. Primary metric is the number of correctly fixed bugs (plausible and correct patches).
Current State of the Art
SRepair
SUTD
101
correct-patches
correct-patches Progress Over Time
Showing 3 breakthroughs from Aug 2022 to Apr 2024
Key Milestones
Apr 2023
ChatRepair
Defects4J v1.2 (395 bugs). ChatRepair using GPT-3.5 with conversational feedback loop. TOSEM 2024.
78.0
+239.1%
Apr 2024
SRepairCurrent SOTA
Defects4J v1.2 single-function subset (300 bugs). SRepair multi-agent with GPT-4. ICSE 2025.
101.0
+29.5%
Total Improvement
339.1%
Time Span
1y 8m
Breakthroughs
3
Current SOTA
101.0
Top Models Performance Comparison
Top 5 models ranked by correct-patches
Best Score
101.0
Top Model
SRepair
Models Compared
5
Score Range
78.0
correct-patchesPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | SRepair SUTD | 101 | Apr 2024 | |
| 2 | Claude Opus 4API Anthropic | 89 | Mar 2026 | |
| 3 | GPT-4oAPI OpenAI | 82 | Apr 2024 | |
| 4 | ChatRepair Fudan University | 78 | Jan 2024 | |
| 5 | AlphaRepair ETH Zurich | 23 | Aug 2022 |
Related Papers3
SRepair: Utilizing Multiple LLM Agents for Automated Program Repair
Apr 2024Models: SRepair, GPT-4o
ChatRepair: A Conversational Approach to Automated Program Repair
Jan 2024Models: ChatRepair
Less Training, More Repairing Please: Revisiting Automated Program Repair via Zero-Shot Learning
Aug 2022Models: AlphaRepair