Extractive summary
Select source sentences and preserve auditability.
Summarization compresses text, but the real requirement is usually fidelity. Pick extractive, abstractive, or long-context summarization based on whether missing details, invented facts, or style drift are the biggest risk.
Summarization turns long input into a shorter artifact, but the output contract changes by use case. News summarization rewards compression and fluency; meeting and legal summaries need coverage; enterprise summaries need source-grounded facts, citations, and explicit handling of uncertainty.
One leaderboard rarely captures the task. Use the canonical benchmark for lineage, then add harder or more domain-specific checks before choosing a model.
| Benchmark | Role | Metric | Caveat |
|---|---|---|---|
| CNN/DailyMail | Classic news summarization | ROUGE | Useful for lineage; weak proxy for long-context, factual, or domain-specific summaries. |
| XSum | Abstractive stress test | ROUGE / human eval | Encourages concise rewriting and can reward unsupported abstraction. |
| SummEval / QAGS | Quality and factuality | Coherence / consistency / answerability | Better quality signal, but still smaller than real enterprise document sets. |
| Local source-grounded eval | Production gate | Claim support / coverage / omission rate | Needed when missed obligations or invented facts are expensive. |
The public benchmark is a shortlist signal. Production choice still depends on latency, cost, domain drift, and how expensive mistakes are.
| Axis | Value | Why it matters |
|---|---|---|
| Classic benchmark | CNN/DailyMail | Good for news-style compression, weak for modern enterprise documents. |
| Abstractive stress test | XSum | Tests concise rewriting but can reward unsupported abstraction. |
| Production metric | Factual consistency + coverage | ROUGE is not enough; check missing obligations and hallucinated claims. |
| Failure mode | Confident omission | The summary sounds good while dropping the one fact the user needed. |
Select source sentences and preserve auditability.
Better structure and tone, but needs factual checks.
Chunking and coverage tracking prevent important sections from disappearing.
Every key claim should map back to source spans.
Open the lower-level explainer for architecture, code examples, and implementation options.
Open summarization explainer ->