Optical Character Recognition2020en

cnn-/-daily-mail

Dataset from Papers With Code

Metrics:accuracy, cer, wer, f1
Saturated BenchmarkLast significant update: Jun 2022

ROUGE-based evaluation is saturated. No significant improvements since 2022. Modern summarization uses LLM-as-judge (G-Eval), human preference evaluations, or factual consistency metrics.

ppl

#ModelScorePaper / CodeDate
1
Bottom-Up Sum
32.75Aug 2018
2
C2F + ALTERNATE
23.6
Coarse-to-Fine Attention Models for Document Summarization
Sep 2017

rouge-1

#ModelScorePaper / CodeDate
1
Scrambled code + broken (alter)
48.18Oct 2022
2
PEGASUS + SummaReranker
47.16Mar 2022
3
Fourier Transformer
44.76May 2023
4
GLM-XXLarge
44.7Mar 2021
5
HAT-BART
44.48Apr 2021
6
MatchSum (RoBERTa-base)
44.41Apr 2020
7
Hie-BART
44.35
Hie-BART: Document Summarization with Hierarchical BART
Jun 2021
8
MatchSum (BERT-base)
44.22Apr 2020
9
BertSumExt
43.85Aug 2019
10
BigBird-Pegasus
43.84Jul 2020
11
T5-11B
43.52Oct 2019
12
BERTSUM+Transformer
43.25Mar 2019
13
UniLM (Abstractive Summarization)
43.08May 2019
14
Selector+Pointer Generator
41.72Sep 2019
15
NeuSUM
41.59Jul 2018
16
Bottom-Up Sum
41.22Aug 2018
17
TaLK Convolutions (Deep)
40.59Feb 2020
18
Lead-3
40.34Apr 2017
19
TaLK Convolutions (Standard)
40.03Feb 2020
20
ML + RL (Paulus et al., 2017)
39.87May 2017
21
DynamicConv
39.84Jan 2019
22
LightConv
39.52Jan 2019
23
Synthesizer (R+V)
38.57May 2020
24
ML + Intra-Attention (Paulus et al., 2017)
38.3May 2017
25
C2F + ALTERNATE
31.1
Coarse-to-Fine Attention Models for Document Summarization
Sep 2017
26
GPT-2
29.34
Language Models are Unsupervised Multitask LearnersCode
Feb 2019

rouge-2

#ModelScorePaper / CodeDate
1
PEGASUS + SummaReranker
22.55Mar 2022
2
Fourier Transformer
21.55May 2023
3
T5-11B
21.55Oct 2019
4
GLM-XXLarge
21.4Mar 2021
5
Hie-BART
21.37
Hie-BART: Document Summarization with Hierarchical BART
Jun 2021
6
HAT-BART
21.31Apr 2021
7
BigBird-Pegasus
21.11Jul 2020
8
MatchSum (RoBERTa-base)
20.86Apr 2020
9
MatchSum (BERT-base)
20.62Apr 2020
10
UniLM (Abstractive Summarization)
20.43May 2019
11
BertSumExt
20.34Aug 2019
12
BERTSUM+Transformer
20.24Mar 2019
13
Scrambled code + broken (alter)
19.84Oct 2022
14
NeuSUM
19.01Jul 2018
15
TaLK Convolutions (Deep)
18.97Feb 2020
16
Selector+Pointer Generator
18.74Sep 2019
17
Bottom-Up Sum
18.68Aug 2018
18
TaLK Convolutions (Standard)
18.45Feb 2020
19
Lead-3
17.7Apr 2017
20
DynamicConv
16.25Jan 2019
21
Synthesizer (R+V)
16.24May 2020
22
LightConv
15.97Jan 2019
23
ML + RL (Paulus et al., 2017)
15.82May 2017
24
C2F + ALTERNATE
15.4
Coarse-to-Fine Attention Models for Document Summarization
Sep 2017
25
ML + Intra-Attention (Paulus et al., 2017)
14.81May 2017
26
GPT-2
8.27
Language Models are Unsupervised Multitask LearnersCode
Feb 2019

rouge-l

#ModelScorePaper / CodeDate
1
Scrambled code + broken (alter)
45.35Oct 2022
2
PEGASUS + SummaReranker
43.87Mar 2022
3
HAT-BART
41.52Apr 2021
4
GLM-XXLarge
41.4Mar 2021
5
Fourier Transformer
41.34May 2023
6
Hie-BART
41.05
Hie-BART: Document Summarization with Hierarchical BART
Jun 2021
7
BigBird-Pegasus
40.74Jul 2020
8
T5-11B
40.69Oct 2019
9
MatchSum (RoBERTa-base)
40.55Apr 2020
10
MatchSum (BERT-base)
40.38Apr 2020
11
UniLM (Abstractive Summarization)
40.34May 2019
12
BertSumExt
39.9Aug 2019
13
BERTSUM+Transformer
39.63Mar 2019
14
Selector+Pointer Generator
38.79Sep 2019
15
Bottom-Up Sum
38.34Aug 2018
16
NeuSUM
37.98Jul 2018
17
ML + RL (Paulus et al., 2017)
36.9May 2017
18
TaLK Convolutions (Deep)
36.81Feb 2020
19
DynamicConv
36.73Jan 2019
20
Lead-3
36.57Apr 2017
21
LightConv
36.51Jan 2019
22
TaLK Convolutions (Standard)
36.13Feb 2020
23
Synthesizer (R+V)
35.95May 2020
24
ML + Intra-Attention (Paulus et al., 2017)
35.49May 2017
25
C2F + ALTERNATE
28.8
Coarse-to-Fine Attention Models for Document Summarization
Sep 2017
26
GPT-2
26.58
Language Models are Unsupervised Multitask LearnersCode
Feb 2019

Related Papers19

Universal Evasion Attacks on Summarization Scoring
Oct 2022Models: Scrambled code + broken (alter)
Extractive Summarization as Text Matching
Apr 2020Models: MatchSum (RoBERTa-base), MatchSum (BERT-base)
Time-aware Large Kernel Convolutions
Feb 2020Models: TaLK Convolutions (Deep), TaLK Convolutions (Standard)
Fine-tune BERT for Extractive Summarization
Mar 2019Models: BERTSUM+Transformer
A Deep Reinforced Model for Abstractive Summarization
May 2017Models: ML + RL (Paulus et al., 2017), ML + Intra-Attention (Paulus et al., 2017)

Other Optical Character Recognition Datasets