Codesota · Natural Language Processing · Language Modeling · DROPTasks/Natural Language Processing/Language Modeling
Language Modeling · benchmark dataset · EN

Discrete Reasoning Over Paragraphs (DROP).

DROP (Discrete Reasoning Over Paragraphs) is an English reading-comprehension benchmark that requires discrete, multi-step reasoning over paragraphs (e.g., addition, counting, sorting, and resolving references to multiple passage positions). Introduced by Dua et al. in "DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs" (NAACL/ACL 2019; arXiv:1903.00161), the dataset was crowdsourced and adversarially created to avoid shallow shortcuts. The full collection contains approximately 96k question–answer pairs over ~6.7k passages (train ≈77k, dev ≈9.5k, hidden test ≈9.5k). Publicly-available splits on Hugging Face and other mirrors contain the train and dev splits (train ≈77.4k, validation ≈9.54k). Answers include span-based answers and free-form/numeric answers (numerical reasoning is a core focus). Evaluation follows common QA practice with word-level F1 and exact match (EM). The dataset is provided under a CC BY license and is hosted/mirrored by the Allen Institute for AI and on Hugging Face.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies
DROP — Language Modeling benchmark · Codesota | CodeSOTA