Codesota · Natural Language Processing · Language Modeling · ZeroSCROLLS/QuALITYTasks/Natural Language Processing/Language Modeling
Language Modeling · benchmark dataset · EN

QuALITY (ZeroSCROLLS subset).

QuALITY (as used in ZeroSCROLLS) is the QuALITY multiple-choice reading-comprehension / question-answering dataset subset included in the ZeroSCROLLS zero-shot long-context benchmark. The original QuALITY dataset (Pang et al., NAACL 2022; arXiv:2112.08608) contains English passages with very long contexts (average ~5,000 tokens) and human-authored multiple-choice questions and distractors; questions were written and validated by annotators who read the full passage, so many require deep comprehension and cannot be solved by simple skimming or short excerpts. In ZeroSCROLLS the QuALITY data is adapted/used as a zero-shot test (and small validation) set to evaluate long-context model understanding in a zero-shot setting (see ZeroSCROLLS paper arXiv:2305.14196). Use cases: long-document QA / reading comprehension, multiple-choice QA over long contexts.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies