QuALITY (ZeroSCROLLS subset).

Name: QuALITY (ZeroSCROLLS subset) Benchmark Results
Creator: Codesota
License: https://creativecommons.org/licenses/by/4.0/

QuALITY (as used in ZeroSCROLLS) is the QuALITY multiple-choice reading-comprehension / question-answering dataset subset included in the ZeroSCROLLS zero-shot long-context benchmark. The original QuALITY dataset (Pang et al., NAACL 2022; arXiv:2112.08608) contains English passages with very long contexts (average ~5,000 tokens) and human-authored multiple-choice questions and distractors; questions were written and validated by annotators who read the full passage, so many require deep comprehension and cannot be solved by simple skimming or short excerpts. In ZeroSCROLLS the QuALITY data is adapted/used as a zero-shot test (and small validation) set to evaluate long-context model understanding in a zero-shot setting (see ZeroSCROLLS paper arXiv:2305.14196). Use cases: long-document QA / reading comprehension, multiple-choice QA over long contexts.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

QuALITY (ZeroSCROLLS subset).

Best published scores.

Neighbouring benchmarks.

Have a score that beatsthis table?

Have a score that beats
this table?