Program Repair2014en

Defects4J: A Database of Real Faults in Java Programs

Standard program repair benchmark with 835 real bugs from 17 open-source Java projects. Each bug has a fix and triggering test suite. Primary metric is the number of correctly fixed bugs (plausible and correct patches).

Samples:835
Metrics:correct-patches, plausible-patches
Paper / Website
Current State of the Art

SRepair

SUTD

101

correct-patches

correct-patches Progress Over Time

Showing 3 breakthroughs from Aug 2022 to Apr 2024

15.238.662.085.4108.8Aug 2022Jun 2023Apr 2024correct-patchesDate

Key Milestones

Aug 2022
AlphaRepair

Defects4J v1.2 (395 bugs). AlphaRepair zero-shot. FSE 2022.

23.0
Apr 2023
ChatRepair

Defects4J v1.2 (395 bugs). ChatRepair using GPT-3.5 with conversational feedback loop. TOSEM 2024.

78.0
+239.1%
Apr 2024
SRepairCurrent SOTA

Defects4J v1.2 single-function subset (300 bugs). SRepair multi-agent with GPT-4. ICSE 2025.

101.0
+29.5%
Total Improvement
339.1%
Time Span
1y 8m
Breakthroughs
3
Current SOTA
101.0

Top Models Performance Comparison

Top 5 models ranked by correct-patches

correct-patches1SRepair101.0100.0%2Claude Opus 489.088.1%3GPT-4o82.081.2%4ChatRepair78.077.2%5AlphaRepair23.022.8%0%25%50%75%100%% of best
Best Score
101.0
Top Model
SRepair
Models Compared
5
Score Range
78.0

correct-patchesPrimary

Related Papers3