Recent Papers / arXiv:2606.05922
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Authors pending
Abstract
Self-supervised method that re-solves past tasks and selects harness updates by pairwise self-preference; improves SWE-Bench Pro pass rate from 59% to 78% without external labels.
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
Link this paper to benchmark rows, datasets, model cards, and reproduced results as evidence is extracted.