Recent Papers / arXiv:2606.08960
Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops
Authors pending
Abstract
Hacker-fixer loop drives attack success from 62% to 0% on KernelBench.
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
Link this paper to benchmark rows, datasets, model cards, and reproduced results as evidence is extracted.