Codesota · Natural Language Processing · Machine Translation · WMT 2014 English->German (newstest2014)Tasks/Natural Language Processing/Machine Translation
Machine Translation · benchmark dataset · EN

WMT 2014 English–German (WMT14 En→De, newstest2014).

WMT 2014 English–German (WMT14 En–De) is the English⇄German parallel data collection used in the Ninth Workshop on Statistical Machine Translation (WMT 2014) shared translation task. The corpus is a combination of multiple parallel sources commonly used in MT research (e.g., Europarl, Common Crawl, News Commentary, and other parallel collections) and is distributed with standard splits used for training, validation and testing. For the English→German task the training set contains on the order of ~4.5 million sentence pairs (this is the size reported and used in many papers, including “Attention Is All You Need”); commonly used validation/dev and test sets are newstest2013 (dev) and newstest2014 (test). The Hugging Face dataset card (wmt/wmt14) provides per-language-pair configs (e.g., de-en) and lists splits and sizes; it also includes a warning about issues in the Common Crawl portion (misaligned / non-English files). Typical preprocessing applied in literature includes tokenization and Byte-Pair Encoding (BPE) with a shared vocabulary (~37k) as used in the Transformer paper. Primary sources / references: the WMT14 workshop pages (statmt.org/wmt14) and the Hugging Face dataset card (https://huggingface.co/datasets/wmt/wmt14).

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies