Text Summarization2015en
CNN/DailyMail Summarization
300K news articles with multi-sentence summaries. Standard benchmark for abstractive summarization.
Saturated BenchmarkLast significant update: Jun 2022
ROUGE-based evaluation is saturated. No significant improvements since 2022. Modern summarization uses LLM-as-judge (G-Eval), human preference evaluations, or factual consistency metrics.
No benchmark results indexed for this dataset yet.
Contribute results on GitHub