DROP (Discrete Reasoning Over Paragraphs) is an English reading-comprehension benchmark that requires discrete, multi-step reasoning over paragraphs (e.g., addition, counting, sorting, and resolving references to multiple passage positions). Introduced by Dua et al. in "DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs" (NAACL/ACL 2019; arXiv:1903.00161), the dataset was crowdsourced and adversarially created to avoid shallow shortcuts. The full collection contains approximately 96k question–answer pairs over ~6.7k passages (train ≈77k, dev ≈9.5k, hidden test ≈9.5k). Publicly-available splits on Hugging Face and other mirrors contain the train and dev splits (train ≈77.4k, validation ≈9.54k). Answers include span-based answers and free-form/numeric answers (numerical reasoning is a core focus). Evaluation follows common QA practice with word-level F1 and exact match (EM). The dataset is provided under a CC BY license and is hosted/mirrored by the Allen Institute for AI and on Hugging Face.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.