Codesota · Tasks · Language ModelingHome/Tasks/Natural Language Processing/Language Modeling

Natural Language Processing· text-generation

Language Modeling.

Language Modeling is the task of predicting the next word or character in a sequence given the previous context. Language models learn the probability distribution of word sequences and are foundational for many NLP applications including text generation, machine translation, and speech recognition.

55

Datasets

14

Results

perplexity

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

WikiText Perplexity

Language modeling quality measured by perplexity on Wikipedia text

Primary metric: perplexity

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on WikiText Perplexity.

No results yet. Be the first to contribute.

What were you looking for on Language Modeling?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

55 datasets tracked for this task.

WikiText Perplexity

0 results · perplexity

Top: Qwen2.5-Plus — 81.4

Top: Qwen2.5-Plus — 49.7

Top: Qwen2.5-Plus — 96.0

Top: Qwen2.5-Plus — 86.3

Top: Qwen2.5-72B-Instruct — 60.4

Top: Qwen2.5-Plus — 54.6

Top: Qwen2.5-72B-Instruct — 8.72

Top: Qwen2.5-Plus — 84.7

Top: Qwen2.5-72B-Instruct — 88.2

Top: Qwen2.5-Plus — 72.5

Top: Qwen2.5-72B-Instruct — 86.8

Top: Qwen2.5-72B-Instruct — 9.35

Top: Qwen2.5-72B-Instruct — 95.1

okapi MMLU (translated)

Top: Qwen2.5-72B-Instruct — 80.0

Creative Writing Benchmark v3

FACTS Grounding

Global MMLU-Lite

MRCR v2 (≤128K)

OpenRewrite-Eval

Penn Treebank (WSJ Section 23)

ZeroSCROLLS/QuALITY

§ 05 · Related tasks

Other tasks in Natural Language Processing.

Machine Translation Text classification

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Language Modeling? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.