Who leads the cnn-/-daily-mail benchmark?

Scrambled code + broken (alter) currently leads cnn-/-daily-mail with a score of 48.18 on Rouge 1.

What is the state-of-the-art score on cnn-/-daily-mail?

The state-of-the-art result on cnn-/-daily-mail is 48.18 (Rouge 1), achieved by Scrambled code + broken (alter) as of 2025.

How many models are tracked on cnn-/-daily-mail?

Codesota tracks 33 models on cnn-/-daily-mail across 4 metrics.

When was the cnn-/-daily-mail leaderboard last updated?

The cnn-/-daily-mail leaderboard on Codesota includes results through 2025, with the earliest tracked result from 2017.

Codesota · Benchmark · cnn-/-daily-mailHome/Leaderboards/Vision & Documents/Document OCR/cnn-/-daily-mail

Unknown

cnn-/-daily-mail.

Name: cnn-/-daily-mail Benchmark Results
Creator: Unknown
Published: 2017-01-01
License: https://creativecommons.org/licenses/by/4.0/

cnn-/-daily-mail is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for cnn-/-daily-mail.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Rouge 1

Rouge 1 is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Rouge 1verifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Scrambled code + broken (alter) From paper: Universal Evasion Attacks on Summarization Scoring	verified	48.18	2022	Paper ↗Code ↗	Looks wrong?
02	BRIO BRIO: Bringing Order to Abstractive Summarization. ACL 2022. BART-large with contrastive learning. SOTA on CNN/DM at time of publication. Score from Table 1 of the paper.	verified	47.78	2022	Paper ↗	Looks wrong?
03	PEGASUS + SummaReranker From paper: SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization	verified	47.16	2022	Paper ↗Code ↗	Looks wrong?
04	GPT-3.5-Turbo + TriSum Rationale GPT-3.5-Turbo prompted with TriSum structured rationale. Best ROUGE-1 in TriSum paper Table 2. Zero-shot with structured chain-of-thought prompting.	verified	46.7	2024	Paper ↗	Looks wrong?
05	BRIDO BRIDO: Bringing Democratic Order to Abstractive Summarization. arXiv Feb 2025. BART-based with democratic contrastive learning. Trades slight ROUGE drop vs BRIO for better factual consistency (3.82% G-Eval improvement). Table 2.	verified	45.81	2025	Paper ↗	Looks wrong?
06	TriSum-J TriSum: Learning Summarization Ability from LLMs with Structured Rationale. NAACL 2024. TriSum-J = joint learning stage. Table 2.	verified	45.7	2024	Paper ↗	Looks wrong?
07	Fourier Transformer From paper: Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator	verified	44.76	2023	Paper ↗Code ↗	Looks wrong?
08	GLM-XXLarge From paper: GLM: General Language Model Pretraining with Autoregressive Blank Infilling	verified	44.7	2021	Paper ↗Code ↗	Looks wrong?
09	HAT-BART From paper: Hierarchical Learning for Generation with Long Source Sequences	verified	44.48	2021	Paper ↗	Looks wrong?
10	MatchSum (RoBERTa-base) From paper: Extractive Summarization as Text Matching	verified	44.41	2020	Paper ↗Code ↗	Looks wrong?
11	Hie-BART From paper: Hie-BART: Document Summarization with Hierarchical BART	verified	44.35	2021	Paper ↗	Looks wrong?
12	MatchSum (BERT-base) From paper: Extractive Summarization as Text Matching	verified	44.22	2020	Paper ↗Code ↗	Looks wrong?
13	BertSumExt From paper: Text Summarization with Pretrained Encoders	verified	43.85	2019	Paper ↗Code ↗	Looks wrong?
14	BigBird-Pegasus From paper: Big Bird: Transformers for Longer Sequences	verified	43.84	2020	Paper ↗Code ↗	Looks wrong?
15	T5-11B From paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	verified	43.52	2019	Paper ↗Code ↗	Looks wrong?
16	SumHiS SumHiS: Extractive Summarization Exploiting Hidden Structure. arXiv Jun 2024. With semantic filtering. Extractive approach; ROUGE-2 of 32.52 exceeds prior extractive SOTA by 10%. Table 1.	verified	43.48	2024	Paper ↗	Looks wrong?
17	BERTSUM+Transformer From paper: Fine-tune BERT for Extractive Summarization	verified	43.25	2019	Paper ↗Code ↗	Looks wrong?
18	UniLM (Abstractive Summarization) From paper: Unified Language Model Pre-training for Natural Language Understanding and Generation	verified	43.08	2019	Paper ↗Code ↗	Looks wrong?
19	Selector+Pointer Generator From paper: Mixture Content Selection for Diverse Sequence Generation	verified	41.72	2019	Paper ↗Code ↗	Looks wrong?
20	NeuSUM From paper: Neural Document Summarization by Jointly Learning to Score and Select Sentences	verified	41.59	2018	Paper ↗Code ↗	Looks wrong?
21	Bottom-Up Sum From paper: Bottom-Up Abstractive Summarization	verified	41.22	2018	Paper ↗Code ↗	Looks wrong?
22	Llama-2-70B-chat Llama-2-70B-chat with 7-shot in-context learning on CNN/DailyMail. Best overall ICL result in arXiv:2507.05123 (Jul 2025). Outperforms zero-shot by substantial margin.	verified	40.98	2025	Paper ↗	Looks wrong?
23	TaLK Convolutions (Deep) From paper: Time-aware Large Kernel Convolutions	verified	40.59	2020	Paper ↗Code ↗	Looks wrong?
24	Lead-3 From paper: Get To The Point: Summarization with Pointer-Generator Networks	verified	40.34	2017	Paper ↗Code ↗	Looks wrong?
25	TaLK Convolutions (Standard) From paper: Time-aware Large Kernel Convolutions	verified	40.03	2020	Paper ↗Code ↗	Looks wrong?
26	ML + RL (Paulus et al., 2017) From paper: A Deep Reinforced Model for Abstractive Summarization	verified	39.87	2017	Paper ↗Code ↗	Looks wrong?
27	DynamicConv From paper: Pay Less Attention with Lightweight and Dynamic Convolutions	verified	39.84	2019	Paper ↗Code ↗	Looks wrong?
28	LightConv From paper: Pay Less Attention with Lightweight and Dynamic Convolutions	verified	39.52	2019	Paper ↗Code ↗	Looks wrong?
29	Synthesizer (R+V) From paper: Synthesizer: Rethinking Self-Attention in Transformer Models	verified	38.57	2020	Paper ↗Code ↗	Looks wrong?
30	ML + Intra-Attention (Paulus et al., 2017) From paper: A Deep Reinforced Model for Abstractive Summarization	verified	38.3	2017	Paper ↗Code ↗	Looks wrong?
31	Mistral-7B-Instruct-v0.1 Zero-shot evaluation of Mistral-7B-Instruct-v0.1 on CNN/DailyMail. Best zero-shot 7B result in arXiv:2507.05123 (Jul 2025). Standard ROUGE scoring against reference highlights.	verified	37.44	2025	Paper ↗	Looks wrong?
32	C2F + ALTERNATE From paper: Coarse-to-Fine Attention Models for Document Summarization	verified	31.1	2017	Paper ↗	Looks wrong?
33	GPT-2 From paper: Language Models are Unsupervised Multitask Learners	verified	29.34	2019	Paper ↗Code ↗	Looks wrong?

Rouge L

Rouge L is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Rouge Lverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Scrambled code + broken (alter) From paper: Universal Evasion Attacks on Summarization Scoring	verified	45.35	2022	Paper ↗Code ↗	Looks wrong?
02	BRIO BRIO: Bringing Order to Abstractive Summarization. ACL 2022.	verified	44.55	2022	Paper ↗	Looks wrong?
03	PEGASUS + SummaReranker From paper: SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization	verified	43.87	2022	Paper ↗Code ↗	Looks wrong?
04	BRIDO BRIDO: Bringing Democratic Order to Abstractive Summarization. arXiv Feb 2025. Table 2.	verified	42.51	2025	Paper ↗	Looks wrong?
05	SumHiS SumHiS: Extractive Summarization Exploiting Hidden Structure. arXiv Jun 2024. With semantic filtering. Table 1.	verified	42.44	2024	Paper ↗	Looks wrong?
06	TriSum-J TriSum: Learning Summarization Ability from LLMs with Structured Rationale. NAACL 2024. TriSum-J = joint learning stage. Table 2.	verified	41.9	2024	Paper ↗	Looks wrong?
07	HAT-BART From paper: Hierarchical Learning for Generation with Long Source Sequences	verified	41.52	2021	Paper ↗	Looks wrong?
08	GLM-XXLarge From paper: GLM: General Language Model Pretraining with Autoregressive Blank Infilling	verified	41.4	2021	Paper ↗Code ↗	Looks wrong?
09	Fourier Transformer From paper: Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator	verified	41.34	2023	Paper ↗Code ↗	Looks wrong?
10	Hie-BART From paper: Hie-BART: Document Summarization with Hierarchical BART	verified	41.05	2021	Paper ↗	Looks wrong?
11	BigBird-Pegasus From paper: Big Bird: Transformers for Longer Sequences	verified	40.74	2020	Paper ↗Code ↗	Looks wrong?
12	GPT-3.5-Turbo + TriSum Rationale GPT-3.5-Turbo prompted with TriSum structured rationale. TriSum paper Table 2.	verified	40.7	2024	Paper ↗	Looks wrong?
13	T5-11B From paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	verified	40.69	2019	Paper ↗Code ↗	Looks wrong?
14	MatchSum (RoBERTa-base) From paper: Extractive Summarization as Text Matching	verified	40.55	2020	Paper ↗Code ↗	Looks wrong?
15	MatchSum (BERT-base) From paper: Extractive Summarization as Text Matching	verified	40.38	2020	Paper ↗Code ↗	Looks wrong?
16	UniLM (Abstractive Summarization) From paper: Unified Language Model Pre-training for Natural Language Understanding and Generation	verified	40.34	2019	Paper ↗Code ↗	Looks wrong?
17	BertSumExt From paper: Text Summarization with Pretrained Encoders	verified	39.9	2019	Paper ↗Code ↗	Looks wrong?
18	BERTSUM+Transformer From paper: Fine-tune BERT for Extractive Summarization	verified	39.63	2019	Paper ↗Code ↗	Looks wrong?
19	Selector+Pointer Generator From paper: Mixture Content Selection for Diverse Sequence Generation	verified	38.79	2019	Paper ↗Code ↗	Looks wrong?
20	Bottom-Up Sum From paper: Bottom-Up Abstractive Summarization	verified	38.34	2018	Paper ↗Code ↗	Looks wrong?
21	NeuSUM From paper: Neural Document Summarization by Jointly Learning to Score and Select Sentences	verified	37.98	2018	Paper ↗Code ↗	Looks wrong?
22	ML + RL (Paulus et al., 2017) From paper: A Deep Reinforced Model for Abstractive Summarization	verified	36.9	2017	Paper ↗Code ↗	Looks wrong?
23	TaLK Convolutions (Deep) From paper: Time-aware Large Kernel Convolutions	verified	36.81	2020	Paper ↗Code ↗	Looks wrong?
24	DynamicConv From paper: Pay Less Attention with Lightweight and Dynamic Convolutions	verified	36.73	2019	Paper ↗Code ↗	Looks wrong?
25	Lead-3 From paper: Get To The Point: Summarization with Pointer-Generator Networks	verified	36.57	2017	Paper ↗Code ↗	Looks wrong?
26	LightConv From paper: Pay Less Attention with Lightweight and Dynamic Convolutions	verified	36.51	2019	Paper ↗Code ↗	Looks wrong?
27	TaLK Convolutions (Standard) From paper: Time-aware Large Kernel Convolutions	verified	36.13	2020	Paper ↗Code ↗	Looks wrong?
28	Synthesizer (R+V) From paper: Synthesizer: Rethinking Self-Attention in Transformer Models	verified	35.95	2020	Paper ↗Code ↗	Looks wrong?
29	ML + Intra-Attention (Paulus et al., 2017) From paper: A Deep Reinforced Model for Abstractive Summarization	verified	35.49	2017	Paper ↗Code ↗	Looks wrong?
30	C2F + ALTERNATE From paper: Coarse-to-Fine Attention Models for Document Summarization	verified	28.8	2017	Paper ↗	Looks wrong?
31	Llama-2-70B-chat Llama-2-70B-chat with 7-shot ICL on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).	verified	27.52	2025	Paper ↗	Looks wrong?
32	GPT-2 From paper: Language Models are Unsupervised Multitask Learners	verified	26.58	2019	Paper ↗Code ↗	Looks wrong?
33	Mistral-7B-Instruct-v0.1 Zero-shot evaluation of Mistral-7B-Instruct-v0.1 on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).	verified	24.53	2025	Paper ↗	Looks wrong?

Ppl

Ppl is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Pplverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Bottom-Up Sum From paper: Bottom-Up Abstractive Summarization	verified	32.75	2018	Paper ↗Code ↗	Looks wrong?
02	C2F + ALTERNATE From paper: Coarse-to-Fine Attention Models for Document Summarization	verified	23.6	2017	Paper ↗	Looks wrong?

Rouge 2

Rouge 2 is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Rouge 2verifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	SumHiS SumHiS: Extractive Summarization Exploiting Hidden Structure. arXiv Jun 2024. SOTA ROUGE-2 for extractive models, exceeding prior best by 10%. Table 1.	verified	32.52	2024	Paper ↗	Looks wrong?
02	BRIO BRIO: Bringing Order to Abstractive Summarization. ACL 2022.	verified	23.75	2022	Paper ↗	Looks wrong?
03	GPT-3.5-Turbo + TriSum Rationale GPT-3.5-Turbo prompted with TriSum structured rationale. TriSum paper Table 2.	verified	23.5	2024	Paper ↗	Looks wrong?
04	BRIDO BRIDO: Bringing Democratic Order to Abstractive Summarization. arXiv Feb 2025. Table 2.	verified	22.95	2025	Paper ↗	Looks wrong?
05	TriSum-J TriSum: Learning Summarization Ability from LLMs with Structured Rationale. NAACL 2024. TriSum-J = joint learning stage. Table 2.	verified	22.7	2024	Paper ↗	Looks wrong?
06	PEGASUS + SummaReranker From paper: SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization	verified	22.55	2022	Paper ↗Code ↗	Looks wrong?
07	Fourier Transformer From paper: Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator	verified	21.55	2023	Paper ↗Code ↗	Looks wrong?
08	T5-11B From paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	verified	21.55	2019	Paper ↗Code ↗	Looks wrong?
09	GLM-XXLarge From paper: GLM: General Language Model Pretraining with Autoregressive Blank Infilling	verified	21.4	2021	Paper ↗Code ↗	Looks wrong?
10	Hie-BART From paper: Hie-BART: Document Summarization with Hierarchical BART	verified	21.37	2021	Paper ↗	Looks wrong?
11	HAT-BART From paper: Hierarchical Learning for Generation with Long Source Sequences	verified	21.31	2021	Paper ↗	Looks wrong?
12	BigBird-Pegasus From paper: Big Bird: Transformers for Longer Sequences	verified	21.11	2020	Paper ↗Code ↗	Looks wrong?
13	MatchSum (RoBERTa-base) From paper: Extractive Summarization as Text Matching	verified	20.86	2020	Paper ↗Code ↗	Looks wrong?
14	MatchSum (BERT-base) From paper: Extractive Summarization as Text Matching	verified	20.62	2020	Paper ↗Code ↗	Looks wrong?
15	UniLM (Abstractive Summarization) From paper: Unified Language Model Pre-training for Natural Language Understanding and Generation	verified	20.43	2019	Paper ↗Code ↗	Looks wrong?
16	BertSumExt From paper: Text Summarization with Pretrained Encoders	verified	20.34	2019	Paper ↗Code ↗	Looks wrong?
17	BERTSUM+Transformer From paper: Fine-tune BERT for Extractive Summarization	verified	20.24	2019	Paper ↗Code ↗	Looks wrong?
18	Scrambled code + broken (alter) From paper: Universal Evasion Attacks on Summarization Scoring	verified	19.84	2022	Paper ↗Code ↗	Looks wrong?
19	NeuSUM From paper: Neural Document Summarization by Jointly Learning to Score and Select Sentences	verified	19.01	2018	Paper ↗Code ↗	Looks wrong?
20	TaLK Convolutions (Deep) From paper: Time-aware Large Kernel Convolutions	verified	18.97	2020	Paper ↗Code ↗	Looks wrong?
21	Selector+Pointer Generator From paper: Mixture Content Selection for Diverse Sequence Generation	verified	18.74	2019	Paper ↗Code ↗	Looks wrong?
22	Bottom-Up Sum From paper: Bottom-Up Abstractive Summarization	verified	18.68	2018	Paper ↗Code ↗	Looks wrong?
23	TaLK Convolutions (Standard) From paper: Time-aware Large Kernel Convolutions	verified	18.45	2020	Paper ↗Code ↗	Looks wrong?
24	Lead-3 From paper: Get To The Point: Summarization with Pointer-Generator Networks	verified	17.7	2017	Paper ↗Code ↗	Looks wrong?
25	Llama-2-70B-chat Llama-2-70B-chat with 7-shot ICL on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).	verified	17.23	2025	Paper ↗	Looks wrong?
26	Mistral-7B-Instruct-v0.1 Zero-shot evaluation of Mistral-7B-Instruct-v0.1 on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).	verified	16.42	2025	Paper ↗	Looks wrong?
27	DynamicConv From paper: Pay Less Attention with Lightweight and Dynamic Convolutions	verified	16.25	2019	Paper ↗Code ↗	Looks wrong?
28	Synthesizer (R+V) From paper: Synthesizer: Rethinking Self-Attention in Transformer Models	verified	16.24	2020	Paper ↗Code ↗	Looks wrong?
29	LightConv From paper: Pay Less Attention with Lightweight and Dynamic Convolutions	verified	15.97	2019	Paper ↗Code ↗	Looks wrong?
30	ML + RL (Paulus et al., 2017) From paper: A Deep Reinforced Model for Abstractive Summarization	verified	15.82	2017	Paper ↗Code ↗	Looks wrong?
31	C2F + ALTERNATE From paper: Coarse-to-Fine Attention Models for Document Summarization	verified	15.4	2017	Paper ↗	Looks wrong?
32	ML + Intra-Attention (Paulus et al., 2017) From paper: A Deep Reinforced Model for Abstractive Summarization	verified	14.81	2017	Paper ↗Code ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR