Who leads the Defects4J benchmark?

SRepair currently leads Defects4J with a score of 101 on Correct Patches.

What is the state-of-the-art score on Defects4J?

The state-of-the-art result on Defects4J is 101 (Correct Patches), achieved by SRepair as of 2026.

How many models are tracked on Defects4J?

Codesota tracks 5 models on Defects4J.

When was the Defects4J leaderboard last updated?

The Defects4J leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2022.

Codesota · Benchmark · Defects4JHome/Leaderboards/Code & Software Engineering/Program Repair/Defects4J

Unknown

Defects4J.

Name: Defects4J Benchmark Results
Creator: Unknown
Published: 2022-01-01
License: https://creativecommons.org/licenses/by/4.0/

Standard program repair benchmark with 835 real bugs from 17 open-source Java projects. Each bug has a fix and triggering test suite. Primary metric is the number of correctly fixed bugs (plausible and correct patches).

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Correct Patches

Correct Patches is the reported evaluation metric for Defects4J. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Correct Patchesverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	SRepair Defects4J v1.2 single-function subset (300 bugs). SRepair multi-agent with GPT-4. ICSE 2025.	verified	101	2024	Paper ↗	Looks wrong?
02	Claude Opus 4 Defects4J v1.2 evaluation, single-function bugs.	verified	89	2026	Source ↗	Looks wrong?
03	GPT-4o Defects4J v1.2, single-function bugs. GPT-4o direct repair. Reported in SRepair evaluation.	verified	82	2024	Paper ↗	Looks wrong?
04	ChatRepair Defects4J v1.2 (395 bugs). ChatRepair using GPT-3.5 with conversational feedback loop. TOSEM 2024.	verified	78	2024	Paper ↗	Looks wrong?
05	AlphaRepair Defects4J v1.2 (395 bugs). AlphaRepair zero-shot. FSE 2022.	verified	23	2022	Paper ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Program Repair