Text Summarization
Condense long documents into concise summaries. Essential for news aggregation, research, and document processing.
How Text Summarization Works
From extractive highlighting to abstractive generation. How models learn to condense documents while preserving meaning.
The Fundamental Question: Copy or Generate?
Every summarization system must answer this: Should I select existing sentences, or write new ones? This choice shapes everything downstream.
Identify and extract the most important sentences from the source document. The summary is a subset of the original text.
- + Always grammatical
- + Faithful to source
- + Fast
- + No hallucinations
- - Can feel choppy
- - Limited compression
- - May miss nuance
- - Cannot paraphrase
Generate new text that captures the key information from the source. The model can paraphrase, combine ideas, and use words not in the original.
- + Natural flow
- + High compression
- + Can synthesize
- + More flexible
- - May hallucinate
- - Can miss facts
- - Slower
- - Needs more data
See the Difference
How Abstractive Summarization Works
The encoder-decoder architecture: read the full document, build understanding, then generate a compressed version.
The Encoder-Decoder Pipeline
Cross-Attention: The Key to Summarization
When generating each word of the summary, the decoder "looks back" at the encoded document. It learns which parts of the source are relevant for the current output position.
When generating "generates", the model attends strongly to "produces", "20%", and "oxygen" from the source. Note how it paraphrases ("generates" instead of "produces") while preserving meaning.
- - Encoder builds full document understanding before any generation
- - Decoder can attend to any part of the document at any time
- - Natural fit for compression: many inputs to few outputs
- - Works fine with instruction prompting ("summarize:")
- - But: document must fit in context with summary space
- - No separate encoding step means less efficient attention
Key Models
From specialized summarizers to general-purpose LLMs.
facebook/bart-large-cnnWhich Model Should You Use?
Long Document Strategies
When your document exceeds the model's context window, you have options.
Split document into chunks that fit the model's context window. Summarize each chunk, then optionally summarize the summaries.
- + Works with any model
- + Simple to implement
- + Parallelizable
- - Loses cross-chunk context
- - Quality depends on chunk boundaries
- - Multi-step latency
Build a tree structure: summarize paragraphs, then sections, then the whole document.
- + Preserves structure
- + Handles very long docs
- + Natural for reports
- - Complex pipeline
- - Error propagation
- - Needs document structure
Use sparse attention patterns: local attention for nearby tokens, global attention for special tokens.
- + Single pass
- + Maintains global context
- + O(n) not O(n2)
- - Still limited context
- - Special model required
- - May miss distant relations
Modern LLMs (Claude 200K, GPT-4 128K) can process entire documents in one pass.
- + Sees everything at once
- + No information loss
- + Simple
- - Expensive
- - Attention dilution possible
- - Context limits still exist
Context Length in Perspective
Approximate page counts assuming ~500 tokens per page. Actual varies with formatting.
Benchmarks & Evaluation
Standard datasets and the ROUGE metrics used to measure summarization quality.
Understanding ROUGE Scores
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures overlap between the generated summary and human-written reference summaries.
| Dataset | Type | Size | Avg Length | Summary Style | SOTA (R1/R2/RL) |
|---|---|---|---|---|---|
| CNN/DailyMail | News | 300K articles | 800 words | Multi-sentence highlights | PEGASUS: 44.2 / 21.5 / 41.1 |
| XSum | Extreme Summary | 227K articles | 400 words | Single sentence | PEGASUS: 47.2 / 24.6 / 39.3 |
| arXiv | Scientific Papers | 215K papers | 6K words | Abstract | LED: 46.6 / 19.6 / 42.0 |
| PubMed | Medical | 133K abstracts | 3K words | Abstract | LED: 45.5 / 19.1 / 41.0 |
| MultiNews | Multi-Document | 56K clusters | 2K words (10 docs) | Comprehensive summary | PRIMERA: 49.9 / 21.1 / 25.9 |
Code Examples
From quick BART inference to hierarchical summarization pipelines.
from transformers import pipeline
# Load BART fine-tuned on CNN/DailyMail
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
article = """
The Amazon rainforest, often referred to as the "lungs of the Earth,"
produces approximately 20% of the world's oxygen. Spanning nine countries
and covering 5.5 million square kilometers, it is the largest tropical
rainforest on the planet. However, deforestation rates have accelerated
dramatically in recent years, threatening not only biodiversity but also
global climate stability. Scientists warn that losing the Amazon could
trigger irreversible climate tipping points.
"""
summary = summarizer(
article,
max_length=60,
min_length=20,
do_sample=False
)
print(summary[0]['summary_text'])
# Output: The Amazon rainforest produces 20% of the world's oxygen.
# Deforestation threatens biodiversity and global climate stability.End-to-End Example
See how different approaches handle the same news article.
Quick Reference
- - BART-CNN for news/short docs
- - LED for papers/reports (16K)
- - Claude/GPT-4 for best quality
- - Under 16K: use LED directly
- - Under 200K: use Claude directly
- - Longer: chunk + hierarchical
- - ROUGE for quick comparison
- - BERTScore for semantic
- - Human eval for production
Use Cases
- ✓News summarization
- ✓Research paper digests
- ✓Meeting notes
- ✓Legal document summaries
- ✓Email tl;dr
Architectural Patterns
Extractive Summarization
Select important sentences from the source.
- +Faithful to source
- +Fast
- +No hallucination
- -Less fluent
- -Can't paraphrase
- -Fixed to source text
Abstractive Summarization
Generate new text that captures the meaning.
- +Fluent output
- +Can condense more
- +Natural reading
- -May hallucinate
- -Slower
- -Needs more compute
LLM Summarization
Use large language models with summarization prompts.
- +Handles any format
- +Controllable style
- +Long context
- -Expensive
- -May miss details
- -Inconsistent
Implementations
API Services
Claude
Anthropic200K context. Excellent for long documents.
GPT-4o
OpenAI128K context. Great instruction following.
Open Source
Benchmarks
Quick Facts
- Input
- Text
- Output
- Text
- Implementations
- 3 open source, 2 API
- Patterns
- 3 approaches