hotpotqa

Unknown

OCR benchmark

2
Total Results
2
Models Tested
1
Metrics
2026-03-06
Last Updated

f1

Higher is better

RankModelScoreSource
1gpt-4o

Multi-hop question answering requiring reasoning over Wikipedia.

71.3arxiv-paper
2claude-35-sonnet68.5arxiv-paper

Explore More OCR Content