Penn Treebank (Wall Street Journal, Section 23).

Name: Penn Treebank (Wall Street Journal, Section 23) Benchmark Results
Creator: Codesota
License: https://creativecommons.org/licenses/by/4.0/

The Penn Treebank (PTB) WSJ portion is a widely used annotated corpus of Wall Street Journal newswire text (roughly 1 million words). It was originally described in Marcus et al., 1993 ("Building a Large Annotated Corpus of English: The Penn Treebank") and distributed as the Treebank releases (e.g. Treebank-3 / LDC99T42). The WSJ portion is annotated for part-of-speech (POS) and syntactic constituency trees and is commonly used for parsing, POS tagging and language modeling research. Section 23 of the WSJ is the standard test set in many parsing and language-modeling evaluations (e.g., parsing train/dev/test splits often use sections 02–21 for training, 22 for development and 23 for test). Hugging Face hosts a text-only PTB dataset (ptb-text-only/ptb_text_only) which provides the PTB text splits (the HF dataset notes that the source is the Penn Treebank Project / WSJ material and that licensing is via LDC). Note: the original Penn Treebank was published in Computational Linguistics (Marcus et al., 1993) and the corpus distribution is controlled by the LDC (Treebank releases such as LDC99T42).

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

Penn Treebank (Wall Street Journal, Section 23).

Best published scores.

Neighbouring benchmarks.

Have a score that beatsthis table?

Have a score that beats
this table?