Recent Papers / arXiv:2605.19099
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
Authors pending
Abstract
Emergent delegation evaluation across GAIA, BFCL, and tau-bench
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
- Verify that DecisionBench routing fidelity-at-1 ranges from 7.5% to 29.5% across conditions.