Long-Context Summarization
Summarize 100K+ token inputs like transcripts, hearings, or books with structured outputs.
How Long Context Summarization Works
When documents exceed your model's context window, you need strategies. From chunking approaches to modern long-context models that handle 200K+ tokens in one pass.
The Problem: Documents Are Longer Than Context Windows
You have a 50-page report. You want a summary. But your model can only see 4,096 tokens at once. What do you do?
With 4K-16K context windows, you had no choice but to chunk documents, summarize pieces separately, and somehow combine them. Information was inevitably lost.
Claude, GPT-4, and Gemini handle 128K-1M tokens. Most documents fit in one pass. Chunking strategies are now for edge cases, not the default.
Summarization Strategies
Four approaches to summarizing long documents, from chunking-based methods to direct long-context processing.
Map-Reduce
Summarize each chunk, then summarize the summaries
Split the document into chunks that fit your model. Summarize each chunk independently (map phase). Then combine all chunk summaries and summarize again (reduce phase).
- + Parallelizable
- + Works with any model
- + Handles arbitrarily long docs
- - Loses cross-chunk context
- - Multiple API calls
- - Can miss connections between sections
Very long documents (100K+ tokens) with independent sections
Which Strategy Should You Use?
Interactive: Watch Map-Reduce in Action
See how a long document is chunked, each chunk summarized, and summaries combined.
Chunking Strategies: How to Split Documents
When you must chunk, how you split matters. The wrong boundaries can cut ideas in half.
Split every N tokens, with optional overlap
chunks = text_splitter.split_text(doc, chunk_size=4000, overlap=200)Try to split at paragraphs, then sentences, then words
RecursiveCharacterTextSplitter(separators=["\n\n", "\n", ". ", " "])Use embeddings to find natural topic boundaries
SemanticChunker(embeddings=OpenAIEmbeddings(), breakpoint_threshold=0.3)Split by headers, sections, or document structure
MarkdownHeaderTextSplitter(headers_to_split_on=[("#", "H1"), ("##", "H2")])Chunk Size Recommendations
Model Context Windows (2024-2025)
Know your model's limits. Modern long-context models have fundamentally changed the game.
Just because a model can accept 200K tokens doesn't mean it attends equally to all of them. Research shows LLMs often "lose" information in the middle of long contexts. For critical summarization, consider placing the most important content at the beginning or end.
Code Examples
From LangChain's built-in chains to direct long-context API calls.
from anthropic import Anthropic
client = Anthropic()
def summarize_long_document(document: str, max_summary_words: int = 500) -> str:
"""
Summarize a long document using Claude's 200K context window.
No chunking needed for most documents.
"""
response = client.messages.create(
model="claude-sonnet-4-20250514", # or claude-3-5-sonnet
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""Summarize the following document in approximately
{max_summary_words} words. Focus on:
1. Key findings and conclusions
2. Important data points and metrics
3. Action items or recommendations
Document:
{document}
Summary:"""
}]
)
return response.content[0].text
# For very long documents (100K+ tokens), still simple:
with open("long_report.txt") as f:
full_document = f.read()
summary = summarize_long_document(full_document)Use direct long-context calls with Claude or GPT-4 Turbo for most documents. Only reach for chunking strategies when documents exceed 200K tokens.
Long-Context (Claude)When you have documents longer than any model's context, use Map-Reduce for parallel processing or Hierarchical for structured content.
Map-Reduce, HierarchicalQuick Reference
- - Use long-context models by default
- - Claude 200K / GPT-4 128K fits most docs
- - Gemini 1M for truly massive content
- - No chunking = no information loss
- - Map-Reduce for parallel processing
- - Refine for sequential narratives
- - Hierarchical for structured docs
- - Use semantic chunking when possible
- - Check doc length before choosing strategy
- - Place key content at start/end
- - Use overlap to preserve context
- - Consider cost: long context = more tokens
Use Cases
- ✓Hour-long meetings
- ✓Earnings calls
- ✓Legal discovery
- ✓Book/episode recaps
Architectural Patterns
Sliding-Window + Merge
Chunk then merge summaries hierarchically.
Native Long-Context LLM
Directly ingest long sequences (1M+ tokens).
Implementations
API Services
Gemini 1.5 Pro
Google1M context for very long inputs.
Claude 3.5 Sonnet 200K
AnthropicHigh-quality long-context summaries.
Open Source
Benchmarks
Quick Facts
- Input
- Text
- Output
- Text
- Implementations
- 1 open source, 2 API
- Patterns
- 2 approaches
Related Blocks
Have benchmark data?
Help us track the state of the art for long-context summarization.
Submit Results