hotpotqa

Unknown

OCR benchmark

2
Total Results
2
Models Tested
1
Metrics
2025-12-21
Last Updated

f1

Higher is better

RankModelScoreSource
1gpt-4o

Multi-hop question answering requiring reasoning over Wikipedia.

71.3arxiv-paper
2claude-35-sonnet68.5arxiv-paper

Explore More OCR Content