Recent Papers / arXiv:2606.06523
Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory
Authors pending
Abstract
First Lean4-based framework for agent behavior verification; verified workflows outperform failing ones by 11.94% on SWE-Bench-Verified and ELAIP-Bench, with 7.47% further gain from LeanEvolve revision.
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
- Lean4Agent: verification-passing vs. failing workflow accuracy on SWE-Bench-Verified subset (abstract reports 11.94% average difference)