Scene Text Recognition2020en

svt

Dataset from Papers With Code

Metrics:accuracy, cer, wer, f1
Current State of the Art

CLIP4STR-H (DFN-5B)

Unknown

99.1

accuracy

accuracy Progress Over Time

Showing 13 breakthroughs from Jun 2014 to May 2023

64.974.283.592.9102.2Jun 2014Mar 2016Dec 2017Oct 2019Jul 2021May 2023accuracyDate

Key Milestones

Jun 2014
CHAR

From paper: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

68.0
Jul 2015
CRNN

From paper: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

80.8
+18.8%
Mar 2016
RARE

From paper: Robust Scene Text Recognition with Automatic Rectification

81.9
+1.4%
Sep 2016
STAR-Net

From paper: Star-net: A spatial attention residue network for scene text recognition.

83.6
+2.1%
Jun 2018
ASTER

From paper: ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

89.5
+7.1%
Oct 2019
SATRN

From paper: On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

91.3
+2.0%
Mar 2020
SRN

From paper: Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

91.5
+0.2%
Jun 2021
RCEED

From paper: Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition

91.8
+0.3%
Jul 2021
Yet Another Text Recognizer

From paper: Why You Should Try the Real Data for the Scene Text Recognition

94.7
+3.2%
Nov 2021
MATRN

From paper: Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

95.0
+0.3%
Dec 2021
S-GTR

From paper: Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

95.8
+0.8%
Sep 2022
MGP-STR

From paper: Multi-Granularity Prediction for Scene Text Recognition

98.6
+2.9%
May 2023
CLIP4STR-H (DFN-5B)Current SOTA

From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

99.1
+0.5%
Total Improvement
45.7%
Time Span
9y 1m
Breakthroughs
13
Current SOTA
99.1

Top Models Performance Comparison

Top 10 models ranked by accuracy

accuracy1CLIP4STR-H (DFN-5B)99.1100.0%2DTrOCR 105M98.999.8%3MGP-STR98.699.5%4CLIP4STR-L (DataComp-1B)98.699.5%5CPPD98.599.4%6CLIP4STR-L98.599.4%7CLIP4STR-B*98.399.2%8CCD-ViT-Base(ARD_2.8M)97.898.7%9CCD-ViT-Small(ARD_2.8M)96.497.3%10CCD-ViT-Tiny(ARD_2.8M)96.096.9%0%25%50%75%100%% of best
Best Score
99.1
Top Model
CLIP4STR-H (DFN-5B)
Models Compared
10
Score Range
3.1

accuracyPrimary

#ModelScorePaper / CodeDate
1
CLIP4STR-H (DFN-5B)
99.1May 2023
2
DTrOCR 105M
98.9Aug 2023
3
MGP-STR
98.6Sep 2022
4
CLIP4STR-L (DataComp-1B)
98.6May 2023
5
CPPD
98.5Jul 2023
6
CLIP4STR-L
98.5May 2023
7
CLIP4STR-B*
98.3May 2023
8
CCD-ViT-Base(ARD_2.8M)
97.8Nov 2022
9
CCD-ViT-Small(ARD_2.8M)
96.4Nov 2022
10
CCD-ViT-Tiny(ARD_2.8M)
96Nov 2022
11
S-GTR
95.8Dec 2021
12
SIGA_T
95.1Mar 2022
13
MATRN
95Nov 2021
14
Yet Another Text Recognizer
94.7Jul 2021
15
NRTR+TPS++
94.6May 2023
16
DPAN
93.9
Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text RecognitionCode
Aug 2021
17
CDistNet (Ours)
93.82Nov 2021
18
DiffusionSTR
93.6Jun 2023
19
RCEED
91.8Jun 2021
20
SRN
91.5Mar 2020
21
SATRN
91.3Oct 2019
22
CSTR
90.6Feb 2021
23
TextScanner
90.1Dec 2019
24
SEED
89.6May 2020
25
ASTER
89.5
ASTER: An Attentional Scene Text Recognizer with Flexible RectificationCode
Jun 2018
26
DAN
89.2Dec 2019
27
SAFL
88.6Jan 2022
28
ViTSTR
87.7May 2021
29
Baek et al.
87.5Apr 2019
30
CA-FCN
86.4Sep 2018
31
SAR
84.5Nov 2018
32
STAR-Net
83.6
Star-net: A spatial attention residue network for scene text recognition.Code
Sep 2016
33
RARE
81.9Mar 2016
34
CRNN
80.8Jul 2015
35
CHAR
68Jun 2014

Related Papers27

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
May 2023Models: CLIP4STR-H (DFN-5B), CLIP4STR-L (DataComp-1B), CLIP4STR-L +1 more
Self-supervised Character-to-Character Distillation for Text Recognition
Nov 2022Models: CCD-ViT-Base(ARD_2.8M), CCD-ViT-Small(ARD_2.8M), CCD-ViT-Tiny(ARD_2.8M)

Other Scene Text Recognition Datasets