Recent Papers / arXiv:2606.03918
Hedge-Bench: Benchmarking Agents on Hard, Realistic Tasks Pertaining to Financial Reasoning
Authors pending
Abstract
102 real hedge-fund analyst tasks with deterministic grading; frontier models score below 16%.
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
- Confirm Hedge-Bench deterministic grading methodology and whether the <16% score holds for GPT-5 and Claude Opus 4.