HotpotQA
Unknown
113K question-answer pairs requiring reasoning over multiple Wikipedia documents.
Benchmark Stats
Models2
Papers2
Metrics1
SOTA History
Not enough data to show trend.
Only 2 models on this benchmark
Help build the community leaderboard — submit your model results.
f1
f1
Higher is better