Optical Character Recognition2020en
cnn-/-daily-mail
Dataset from Papers With Code
Metrics:accuracy, cer, wer, f1
Saturated BenchmarkLast significant update: Jun 2022
ROUGE-based evaluation is saturated. No significant improvements since 2022. Modern summarization uses LLM-as-judge (G-Eval), human preference evaluations, or factual consistency metrics.
ppl
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Bottom-Up Sum | 32.75 | Aug 2018 | |
| 2 | C2F + ALTERNATE | 23.6 | Coarse-to-Fine Attention Models for Document Summarization | Sep 2017 |
rouge-1
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Scrambled code + broken (alter) | 48.18 | Oct 2022 | |
| 2 | PEGASUS + SummaReranker | 47.16 | Mar 2022 | |
| 3 | Fourier Transformer | 44.76 | May 2023 | |
| 4 | GLM-XXLarge | 44.7 | Mar 2021 | |
| 5 | HAT-BART | 44.48 | Apr 2021 | |
| 6 | MatchSum (RoBERTa-base) | 44.41 | Apr 2020 | |
| 7 | Hie-BART | 44.35 | Hie-BART: Document Summarization with Hierarchical BART | Jun 2021 |
| 8 | MatchSum (BERT-base) | 44.22 | Apr 2020 | |
| 9 | BertSumExt | 43.85 | Aug 2019 | |
| 10 | BigBird-Pegasus | 43.84 | Jul 2020 | |
| 11 | T5-11B | 43.52 | Oct 2019 | |
| 12 | BERTSUM+Transformer | 43.25 | Mar 2019 | |
| 13 | UniLM (Abstractive Summarization) | 43.08 | May 2019 | |
| 14 | Selector+Pointer Generator | 41.72 | Sep 2019 | |
| 15 | NeuSUM | 41.59 | Jul 2018 | |
| 16 | Bottom-Up Sum | 41.22 | Aug 2018 | |
| 17 | TaLK Convolutions (Deep) | 40.59 | Feb 2020 | |
| 18 | Lead-3 | 40.34 | Apr 2017 | |
| 19 | TaLK Convolutions (Standard) | 40.03 | Feb 2020 | |
| 20 | ML + RL (Paulus et al., 2017) | 39.87 | May 2017 | |
| 21 | DynamicConv | 39.84 | Jan 2019 | |
| 22 | LightConv | 39.52 | Jan 2019 | |
| 23 | Synthesizer (R+V) | 38.57 | May 2020 | |
| 24 | ML + Intra-Attention (Paulus et al., 2017) | 38.3 | May 2017 | |
| 25 | C2F + ALTERNATE | 31.1 | Coarse-to-Fine Attention Models for Document Summarization | Sep 2017 |
| 26 | GPT-2 | 29.34 | Language Models are Unsupervised Multitask LearnersCode | Feb 2019 |
rouge-2
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | PEGASUS + SummaReranker | 22.55 | Mar 2022 | |
| 2 | Fourier Transformer | 21.55 | May 2023 | |
| 3 | T5-11B | 21.55 | Oct 2019 | |
| 4 | GLM-XXLarge | 21.4 | Mar 2021 | |
| 5 | Hie-BART | 21.37 | Hie-BART: Document Summarization with Hierarchical BART | Jun 2021 |
| 6 | HAT-BART | 21.31 | Apr 2021 | |
| 7 | BigBird-Pegasus | 21.11 | Jul 2020 | |
| 8 | MatchSum (RoBERTa-base) | 20.86 | Apr 2020 | |
| 9 | MatchSum (BERT-base) | 20.62 | Apr 2020 | |
| 10 | UniLM (Abstractive Summarization) | 20.43 | May 2019 | |
| 11 | BertSumExt | 20.34 | Aug 2019 | |
| 12 | BERTSUM+Transformer | 20.24 | Mar 2019 | |
| 13 | Scrambled code + broken (alter) | 19.84 | Oct 2022 | |
| 14 | NeuSUM | 19.01 | Jul 2018 | |
| 15 | TaLK Convolutions (Deep) | 18.97 | Feb 2020 | |
| 16 | Selector+Pointer Generator | 18.74 | Sep 2019 | |
| 17 | Bottom-Up Sum | 18.68 | Aug 2018 | |
| 18 | TaLK Convolutions (Standard) | 18.45 | Feb 2020 | |
| 19 | Lead-3 | 17.7 | Apr 2017 | |
| 20 | DynamicConv | 16.25 | Jan 2019 | |
| 21 | Synthesizer (R+V) | 16.24 | May 2020 | |
| 22 | LightConv | 15.97 | Jan 2019 | |
| 23 | ML + RL (Paulus et al., 2017) | 15.82 | May 2017 | |
| 24 | C2F + ALTERNATE | 15.4 | Coarse-to-Fine Attention Models for Document Summarization | Sep 2017 |
| 25 | ML + Intra-Attention (Paulus et al., 2017) | 14.81 | May 2017 | |
| 26 | GPT-2 | 8.27 | Language Models are Unsupervised Multitask LearnersCode | Feb 2019 |
rouge-l
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Scrambled code + broken (alter) | 45.35 | Oct 2022 | |
| 2 | PEGASUS + SummaReranker | 43.87 | Mar 2022 | |
| 3 | HAT-BART | 41.52 | Apr 2021 | |
| 4 | GLM-XXLarge | 41.4 | Mar 2021 | |
| 5 | Fourier Transformer | 41.34 | May 2023 | |
| 6 | Hie-BART | 41.05 | Hie-BART: Document Summarization with Hierarchical BART | Jun 2021 |
| 7 | BigBird-Pegasus | 40.74 | Jul 2020 | |
| 8 | T5-11B | 40.69 | Oct 2019 | |
| 9 | MatchSum (RoBERTa-base) | 40.55 | Apr 2020 | |
| 10 | MatchSum (BERT-base) | 40.38 | Apr 2020 | |
| 11 | UniLM (Abstractive Summarization) | 40.34 | May 2019 | |
| 12 | BertSumExt | 39.9 | Aug 2019 | |
| 13 | BERTSUM+Transformer | 39.63 | Mar 2019 | |
| 14 | Selector+Pointer Generator | 38.79 | Sep 2019 | |
| 15 | Bottom-Up Sum | 38.34 | Aug 2018 | |
| 16 | NeuSUM | 37.98 | Jul 2018 | |
| 17 | ML + RL (Paulus et al., 2017) | 36.9 | May 2017 | |
| 18 | TaLK Convolutions (Deep) | 36.81 | Feb 2020 | |
| 19 | DynamicConv | 36.73 | Jan 2019 | |
| 20 | Lead-3 | 36.57 | Apr 2017 | |
| 21 | LightConv | 36.51 | Jan 2019 | |
| 22 | TaLK Convolutions (Standard) | 36.13 | Feb 2020 | |
| 23 | Synthesizer (R+V) | 35.95 | May 2020 | |
| 24 | ML + Intra-Attention (Paulus et al., 2017) | 35.49 | May 2017 | |
| 25 | C2F + ALTERNATE | 28.8 | Coarse-to-Fine Attention Models for Document Summarization | Sep 2017 |
| 26 | GPT-2 | 26.58 | Language Models are Unsupervised Multitask LearnersCode | Feb 2019 |
Related Papers19
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
May 2023Models: Fourier Transformer
Universal Evasion Attacks on Summarization Scoring
Oct 2022Models: Scrambled code + broken (alter)
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
Mar 2022Models: PEGASUS + SummaReranker
Hierarchical Learning for Generation with Long Source Sequences
Apr 2021Models: HAT-BART
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Mar 2021Models: GLM-XXLarge
Big Bird: Transformers for Longer Sequences
Jul 2020Models: BigBird-Pegasus
Synthesizer: Rethinking Self-Attention in Transformer Models
May 2020Models: Synthesizer (R+V)
Extractive Summarization as Text Matching
Apr 2020Models: MatchSum (RoBERTa-base), MatchSum (BERT-base)
Time-aware Large Kernel Convolutions
Feb 2020Models: TaLK Convolutions (Deep), TaLK Convolutions (Standard)
Mixture Content Selection for Diverse Sequence Generation
Sep 2019Models: Selector+Pointer Generator
Text Summarization with Pretrained Encoders
Aug 2019Models: BertSumExt
Unified Language Model Pre-training for Natural Language Understanding and Generation
May 2019Models: UniLM (Abstractive Summarization)
Fine-tune BERT for Extractive Summarization
Mar 2019Models: BERTSUM+Transformer
Pay Less Attention with Lightweight and Dynamic Convolutions
Jan 2019Models: DynamicConv, LightConv
Bottom-Up Abstractive Summarization
Aug 2018Models: Bottom-Up Sum
Neural Document Summarization by Jointly Learning to Score and Select Sentences
Jul 2018Models: NeuSUM
A Deep Reinforced Model for Abstractive Summarization
May 2017Models: ML + RL (Paulus et al., 2017), ML + Intra-Attention (Paulus et al., 2017)
Get To The Point: Summarization with Pointer-Generator Networks
Apr 2017Models: Lead-3