Discrete Reasoning Over Paragraphs (DROP).

DROP (Discrete Reasoning Over Paragraphs) is an English reading-comprehension benchmark that requires discrete, multi-step reasoning over paragraphs (e.g., addition, counting, sorting, and resolving references to multiple passage positions). Introduced by Dua et al. in "DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs" (NAACL/ACL 2019; arXiv:1903.00161), the dataset was crowdsourced and adversarially created to avoid shallow shortcuts. The full collection contains approximately 96k question–answer pairs over ~6.7k passages (train ≈77k, dev ≈9.5k, hidden test ≈9.5k). Publicly-available splits on Hugging Face and other mirrors contain the train and dev splits (train ≈77.4k, validation ≈9.54k). Answers include span-based answers and free-form/numeric answers (numerical reasoning is a core focus). Evaluation follows common QA practice with word-level F1 and exact match (EM). The dataset is provided under a CC BY license and is hosted/mirrored by the Allen Institute for AI and on Hugging Face.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

Discrete Reasoning Over Paragraphs (DROP).

Best published scores.

Neighbouring benchmarks.

Have a score that beatsthis table?

Have a score that beats
this table?