Recent Papers / arXiv:2605.18859
TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing
Authors pending
Abstract
Step-level LLM routing benchmark with static and dynamic tracks for agentic workflows
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
- Verify that TwinRouterBench highest success rate is 64.8% for computer-use models.