Text Summarization2015en

CNN/DailyMail Summarization

300K news articles with multi-sentence summaries. Standard benchmark for abstractive summarization.

Metrics:rouge-1, rouge-2, rouge-l
Paper / WebsiteDownload
Saturated BenchmarkLast significant update: Jun 2022

ROUGE-based evaluation is saturated. No significant improvements since 2022. Modern summarization uses LLM-as-judge (G-Eval), human preference evaluations, or factual consistency metrics.

No benchmark results indexed for this dataset yet.

Contribute results on GitHub
CNN/DailyMail Benchmark - Text Summarization | CodeSOTA