Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

arXiv:2606.08960Submitted Jun 9, 20260 benchmark results

Authors pending

Abstract

Hacker-fixer loop drives attack success from 62% to 0% on KernelBench.

Tasks

Results

No benchmark results recorded yet.

Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →

CodeSOTA extraction

Link this paper to benchmark rows, datasets, model cards, and reproduced results as evidence is extracted.

Add or update benchmark results

Logged-in editor · benchmark trail