Codesota · Benchmark · cnn-/-daily-mailHome/Leaderboards/Vision & Documents/Document OCR/cnn-/-daily-mail
Unknown

cnn-/-daily-mail.

cnn-/-daily-mail is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for cnn-/-daily-mail.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Rouge 1

Rouge 1 is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Rouge 1verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Scrambled code + broken (alter)
From paper: Universal Evasion Attacks on Summarization Scoring
verified48.182022Paper ↗Code ↗Looks wrong?
02BRIO
BRIO: Bringing Order to Abstractive Summarization. ACL 2022. BART-large with contrastive learning. SOTA on CNN/DM at time of publication. Score from Table 1 of the paper.
verified47.782022Paper ↗Looks wrong?
03PEGASUS + SummaReranker
From paper: SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
verified47.162022Paper ↗Code ↗Looks wrong?
04GPT-3.5-Turbo + TriSum Rationale
GPT-3.5-Turbo prompted with TriSum structured rationale. Best ROUGE-1 in TriSum paper Table 2. Zero-shot with structured chain-of-thought prompting.
verified46.72024Paper ↗Looks wrong?
05BRIDO
BRIDO: Bringing Democratic Order to Abstractive Summarization. arXiv Feb 2025. BART-based with democratic contrastive learning. Trades slight ROUGE drop vs BRIO for better factual consistency (3.82% G-Eval improvement). Table 2.
verified45.812025Paper ↗Looks wrong?
06TriSum-J
TriSum: Learning Summarization Ability from LLMs with Structured Rationale. NAACL 2024. TriSum-J = joint learning stage. Table 2.
verified45.72024Paper ↗Looks wrong?
07Fourier Transformer
From paper: Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
verified44.762023Paper ↗Code ↗Looks wrong?
08GLM-XXLarge
From paper: GLM: General Language Model Pretraining with Autoregressive Blank Infilling
verified44.72021Paper ↗Code ↗Looks wrong?
09HAT-BART
From paper: Hierarchical Learning for Generation with Long Source Sequences
verified44.482021Paper ↗Looks wrong?
10MatchSum (RoBERTa-base)
From paper: Extractive Summarization as Text Matching
verified44.412020Paper ↗Code ↗Looks wrong?
11Hie-BART
From paper: Hie-BART: Document Summarization with Hierarchical BART
verified44.352021Paper ↗Looks wrong?
12MatchSum (BERT-base)
From paper: Extractive Summarization as Text Matching
verified44.222020Paper ↗Code ↗Looks wrong?
13BertSumExt
From paper: Text Summarization with Pretrained Encoders
verified43.852019Paper ↗Code ↗Looks wrong?
14BigBird-Pegasus
From paper: Big Bird: Transformers for Longer Sequences
verified43.842020Paper ↗Code ↗Looks wrong?
15T5-11B
From paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
verified43.522019Paper ↗Code ↗Looks wrong?
16SumHiS
SumHiS: Extractive Summarization Exploiting Hidden Structure. arXiv Jun 2024. With semantic filtering. Extractive approach; ROUGE-2 of 32.52 exceeds prior extractive SOTA by 10%. Table 1.
verified43.482024Paper ↗Looks wrong?
17BERTSUM+Transformer
From paper: Fine-tune BERT for Extractive Summarization
verified43.252019Paper ↗Code ↗Looks wrong?
18UniLM (Abstractive Summarization)
From paper: Unified Language Model Pre-training for Natural Language Understanding and Generation
verified43.082019Paper ↗Code ↗Looks wrong?
19Selector+Pointer Generator
From paper: Mixture Content Selection for Diverse Sequence Generation
verified41.722019Paper ↗Code ↗Looks wrong?
20NeuSUM
From paper: Neural Document Summarization by Jointly Learning to Score and Select Sentences
verified41.592018Paper ↗Code ↗Looks wrong?
21Bottom-Up Sum
From paper: Bottom-Up Abstractive Summarization
verified41.222018Paper ↗Code ↗Looks wrong?
22Llama-2-70B-chat
Llama-2-70B-chat with 7-shot in-context learning on CNN/DailyMail. Best overall ICL result in arXiv:2507.05123 (Jul 2025). Outperforms zero-shot by substantial margin.
verified40.982025Paper ↗Looks wrong?
23TaLK Convolutions (Deep)
From paper: Time-aware Large Kernel Convolutions
verified40.592020Paper ↗Code ↗Looks wrong?
24Lead-3
From paper: Get To The Point: Summarization with Pointer-Generator Networks
verified40.342017Paper ↗Code ↗Looks wrong?
25TaLK Convolutions (Standard)
From paper: Time-aware Large Kernel Convolutions
verified40.032020Paper ↗Code ↗Looks wrong?
26ML + RL (Paulus et al., 2017)
From paper: A Deep Reinforced Model for Abstractive Summarization
verified39.872017Paper ↗Code ↗Looks wrong?
27DynamicConv
From paper: Pay Less Attention with Lightweight and Dynamic Convolutions
verified39.842019Paper ↗Code ↗Looks wrong?
28LightConv
From paper: Pay Less Attention with Lightweight and Dynamic Convolutions
verified39.522019Paper ↗Code ↗Looks wrong?
29Synthesizer (R+V)
From paper: Synthesizer: Rethinking Self-Attention in Transformer Models
verified38.572020Paper ↗Code ↗Looks wrong?
30ML + Intra-Attention (Paulus et al., 2017)
From paper: A Deep Reinforced Model for Abstractive Summarization
verified38.32017Paper ↗Code ↗Looks wrong?
31Mistral-7B-Instruct-v0.1
Zero-shot evaluation of Mistral-7B-Instruct-v0.1 on CNN/DailyMail. Best zero-shot 7B result in arXiv:2507.05123 (Jul 2025). Standard ROUGE scoring against reference highlights.
verified37.442025Paper ↗Looks wrong?
32C2F + ALTERNATE
From paper: Coarse-to-Fine Attention Models for Document Summarization
verified31.12017Paper ↗Looks wrong?
33GPT-2
From paper: Language Models are Unsupervised Multitask Learners
verified29.342019Paper ↗Code ↗Looks wrong?

Rouge L

Rouge L is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Rouge Lverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Scrambled code + broken (alter)
From paper: Universal Evasion Attacks on Summarization Scoring
verified45.352022Paper ↗Code ↗Looks wrong?
02BRIO
BRIO: Bringing Order to Abstractive Summarization. ACL 2022.
verified44.552022Paper ↗Looks wrong?
03PEGASUS + SummaReranker
From paper: SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
verified43.872022Paper ↗Code ↗Looks wrong?
04BRIDO
BRIDO: Bringing Democratic Order to Abstractive Summarization. arXiv Feb 2025. Table 2.
verified42.512025Paper ↗Looks wrong?
05SumHiS
SumHiS: Extractive Summarization Exploiting Hidden Structure. arXiv Jun 2024. With semantic filtering. Table 1.
verified42.442024Paper ↗Looks wrong?
06TriSum-J
TriSum: Learning Summarization Ability from LLMs with Structured Rationale. NAACL 2024. TriSum-J = joint learning stage. Table 2.
verified41.92024Paper ↗Looks wrong?
07HAT-BART
From paper: Hierarchical Learning for Generation with Long Source Sequences
verified41.522021Paper ↗Looks wrong?
08GLM-XXLarge
From paper: GLM: General Language Model Pretraining with Autoregressive Blank Infilling
verified41.42021Paper ↗Code ↗Looks wrong?
09Fourier Transformer
From paper: Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
verified41.342023Paper ↗Code ↗Looks wrong?
10Hie-BART
From paper: Hie-BART: Document Summarization with Hierarchical BART
verified41.052021Paper ↗Looks wrong?
11BigBird-Pegasus
From paper: Big Bird: Transformers for Longer Sequences
verified40.742020Paper ↗Code ↗Looks wrong?
12GPT-3.5-Turbo + TriSum Rationale
GPT-3.5-Turbo prompted with TriSum structured rationale. TriSum paper Table 2.
verified40.72024Paper ↗Looks wrong?
13T5-11B
From paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
verified40.692019Paper ↗Code ↗Looks wrong?
14MatchSum (RoBERTa-base)
From paper: Extractive Summarization as Text Matching
verified40.552020Paper ↗Code ↗Looks wrong?
15MatchSum (BERT-base)
From paper: Extractive Summarization as Text Matching
verified40.382020Paper ↗Code ↗Looks wrong?
16UniLM (Abstractive Summarization)
From paper: Unified Language Model Pre-training for Natural Language Understanding and Generation
verified40.342019Paper ↗Code ↗Looks wrong?
17BertSumExt
From paper: Text Summarization with Pretrained Encoders
verified39.92019Paper ↗Code ↗Looks wrong?
18BERTSUM+Transformer
From paper: Fine-tune BERT for Extractive Summarization
verified39.632019Paper ↗Code ↗Looks wrong?
19Selector+Pointer Generator
From paper: Mixture Content Selection for Diverse Sequence Generation
verified38.792019Paper ↗Code ↗Looks wrong?
20Bottom-Up Sum
From paper: Bottom-Up Abstractive Summarization
verified38.342018Paper ↗Code ↗Looks wrong?
21NeuSUM
From paper: Neural Document Summarization by Jointly Learning to Score and Select Sentences
verified37.982018Paper ↗Code ↗Looks wrong?
22ML + RL (Paulus et al., 2017)
From paper: A Deep Reinforced Model for Abstractive Summarization
verified36.92017Paper ↗Code ↗Looks wrong?
23TaLK Convolutions (Deep)
From paper: Time-aware Large Kernel Convolutions
verified36.812020Paper ↗Code ↗Looks wrong?
24DynamicConv
From paper: Pay Less Attention with Lightweight and Dynamic Convolutions
verified36.732019Paper ↗Code ↗Looks wrong?
25Lead-3
From paper: Get To The Point: Summarization with Pointer-Generator Networks
verified36.572017Paper ↗Code ↗Looks wrong?
26LightConv
From paper: Pay Less Attention with Lightweight and Dynamic Convolutions
verified36.512019Paper ↗Code ↗Looks wrong?
27TaLK Convolutions (Standard)
From paper: Time-aware Large Kernel Convolutions
verified36.132020Paper ↗Code ↗Looks wrong?
28Synthesizer (R+V)
From paper: Synthesizer: Rethinking Self-Attention in Transformer Models
verified35.952020Paper ↗Code ↗Looks wrong?
29ML + Intra-Attention (Paulus et al., 2017)
From paper: A Deep Reinforced Model for Abstractive Summarization
verified35.492017Paper ↗Code ↗Looks wrong?
30C2F + ALTERNATE
From paper: Coarse-to-Fine Attention Models for Document Summarization
verified28.82017Paper ↗Looks wrong?
31Llama-2-70B-chat
Llama-2-70B-chat with 7-shot ICL on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).
verified27.522025Paper ↗Looks wrong?
32GPT-2
From paper: Language Models are Unsupervised Multitask Learners
verified26.582019Paper ↗Code ↗Looks wrong?
33Mistral-7B-Instruct-v0.1
Zero-shot evaluation of Mistral-7B-Instruct-v0.1 on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).
verified24.532025Paper ↗Looks wrong?

Ppl

Ppl is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Pplverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Bottom-Up Sum
From paper: Bottom-Up Abstractive Summarization
verified32.752018Paper ↗Code ↗Looks wrong?
02C2F + ALTERNATE
From paper: Coarse-to-Fine Attention Models for Document Summarization
verified23.62017Paper ↗Looks wrong?

Rouge 2

Rouge 2 is the reported evaluation metric for cnn-/-daily-mail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Rouge 2verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01SumHiS
SumHiS: Extractive Summarization Exploiting Hidden Structure. arXiv Jun 2024. SOTA ROUGE-2 for extractive models, exceeding prior best by 10%. Table 1.
verified32.522024Paper ↗Looks wrong?
02BRIO
BRIO: Bringing Order to Abstractive Summarization. ACL 2022.
verified23.752022Paper ↗Looks wrong?
03GPT-3.5-Turbo + TriSum Rationale
GPT-3.5-Turbo prompted with TriSum structured rationale. TriSum paper Table 2.
verified23.52024Paper ↗Looks wrong?
04BRIDO
BRIDO: Bringing Democratic Order to Abstractive Summarization. arXiv Feb 2025. Table 2.
verified22.952025Paper ↗Looks wrong?
05TriSum-J
TriSum: Learning Summarization Ability from LLMs with Structured Rationale. NAACL 2024. TriSum-J = joint learning stage. Table 2.
verified22.72024Paper ↗Looks wrong?
06PEGASUS + SummaReranker
From paper: SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
verified22.552022Paper ↗Code ↗Looks wrong?
07Fourier Transformer
From paper: Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
verified21.552023Paper ↗Code ↗Looks wrong?
08T5-11B
From paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
verified21.552019Paper ↗Code ↗Looks wrong?
09GLM-XXLarge
From paper: GLM: General Language Model Pretraining with Autoregressive Blank Infilling
verified21.42021Paper ↗Code ↗Looks wrong?
10Hie-BART
From paper: Hie-BART: Document Summarization with Hierarchical BART
verified21.372021Paper ↗Looks wrong?
11HAT-BART
From paper: Hierarchical Learning for Generation with Long Source Sequences
verified21.312021Paper ↗Looks wrong?
12BigBird-Pegasus
From paper: Big Bird: Transformers for Longer Sequences
verified21.112020Paper ↗Code ↗Looks wrong?
13MatchSum (RoBERTa-base)
From paper: Extractive Summarization as Text Matching
verified20.862020Paper ↗Code ↗Looks wrong?
14MatchSum (BERT-base)
From paper: Extractive Summarization as Text Matching
verified20.622020Paper ↗Code ↗Looks wrong?
15UniLM (Abstractive Summarization)
From paper: Unified Language Model Pre-training for Natural Language Understanding and Generation
verified20.432019Paper ↗Code ↗Looks wrong?
16BertSumExt
From paper: Text Summarization with Pretrained Encoders
verified20.342019Paper ↗Code ↗Looks wrong?
17BERTSUM+Transformer
From paper: Fine-tune BERT for Extractive Summarization
verified20.242019Paper ↗Code ↗Looks wrong?
18Scrambled code + broken (alter)
From paper: Universal Evasion Attacks on Summarization Scoring
verified19.842022Paper ↗Code ↗Looks wrong?
19NeuSUM
From paper: Neural Document Summarization by Jointly Learning to Score and Select Sentences
verified19.012018Paper ↗Code ↗Looks wrong?
20TaLK Convolutions (Deep)
From paper: Time-aware Large Kernel Convolutions
verified18.972020Paper ↗Code ↗Looks wrong?
21Selector+Pointer Generator
From paper: Mixture Content Selection for Diverse Sequence Generation
verified18.742019Paper ↗Code ↗Looks wrong?
22Bottom-Up Sum
From paper: Bottom-Up Abstractive Summarization
verified18.682018Paper ↗Code ↗Looks wrong?
23TaLK Convolutions (Standard)
From paper: Time-aware Large Kernel Convolutions
verified18.452020Paper ↗Code ↗Looks wrong?
24Lead-3
From paper: Get To The Point: Summarization with Pointer-Generator Networks
verified17.72017Paper ↗Code ↗Looks wrong?
25Llama-2-70B-chat
Llama-2-70B-chat with 7-shot ICL on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).
verified17.232025Paper ↗Looks wrong?
26Mistral-7B-Instruct-v0.1
Zero-shot evaluation of Mistral-7B-Instruct-v0.1 on CNN/DailyMail. arXiv:2507.05123 (Jul 2025).
verified16.422025Paper ↗Looks wrong?
27DynamicConv
From paper: Pay Less Attention with Lightweight and Dynamic Convolutions
verified16.252019Paper ↗Code ↗Looks wrong?
28Synthesizer (R+V)
From paper: Synthesizer: Rethinking Self-Attention in Transformer Models
verified16.242020Paper ↗Code ↗Looks wrong?
29LightConv
From paper: Pay Less Attention with Lightweight and Dynamic Convolutions
verified15.972019Paper ↗Code ↗Looks wrong?
30ML + RL (Paulus et al., 2017)
From paper: A Deep Reinforced Model for Abstractive Summarization
verified15.822017Paper ↗Code ↗Looks wrong?
31C2F + ALTERNATE
From paper: Coarse-to-Fine Attention Models for Document Summarization
verified15.42017Paper ↗Looks wrong?
32ML + Intra-Attention (Paulus et al., 2017)
From paper: A Deep Reinforced Model for Abstractive Summarization
verified14.812017Paper ↗Code ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR