Multi-step Reasoning2018en

HotpotQA

113K question-answer pairs requiring reasoning over multiple Wikipedia documents.

Current State of the Art

GPT-4o

OpenAI

71.3

f1

f1Primary

#ModelScorePaper / CodeDate
1
GPT-4oAPI
OpenAI
71.3Dec 2025
2
Claude 3.5 SonnetAPI
Anthropic
68.5Dec 2025

Other Multi-step Reasoning Datasets