Codesota · Benchmark · Defects4JHome/Leaderboards/Code & Software Engineering/Program Repair/Defects4J
Unknown

Defects4J.

Standard program repair benchmark with 835 real bugs from 17 open-source Java projects. Each bug has a fix and triggering test suite. Primary metric is the number of correctly fixed bugs (plausible and correct patches).

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Correct Patches

Correct Patches is the reported evaluation metric for Defects4J. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Correct Patchesverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01SRepair
Defects4J v1.2 single-function subset (300 bugs). SRepair multi-agent with GPT-4. ICSE 2025.
verified1012024Paper ↗Looks wrong?
02Claude Opus 4
Defects4J v1.2 evaluation, single-function bugs.
verified892026Source ↗Looks wrong?
03GPT-4o
Defects4J v1.2, single-function bugs. GPT-4o direct repair. Reported in SRepair evaluation.
verified822024Paper ↗Looks wrong?
04ChatRepair
Defects4J v1.2 (395 bugs). ChatRepair using GPT-3.5 with conversational feedback loop. TOSEM 2024.
verified782024Paper ↗Looks wrong?
05AlphaRepair
Defects4J v1.2 (395 bugs). AlphaRepair zero-shot. FSE 2022.
verified232022Paper ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Program Repair