HotpotQA

Unknown

113K question-answer pairs requiring reasoning over multiple Wikipedia documents.

Benchmark Stats

Models2
Papers2
Metrics1

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

f1

f1

Higher is better

RankModelCodeScorePaper / Source
1gpt-4o

Multi-hop question answering requiring reasoning over Wikipedia.

-71.3arXiv Paper
2claude-35-sonnet-68.5arXiv Paper