Scene Text Recognition2020en

iiit5k

Dataset from Papers With Code

Metrics:accuracy, cer, wer, f1
Current State of the Art

CLIP4STR-L (DataComp-1B)

Unknown

99.6

accuracy

accuracy Progress Over Time

Showing 6 breakthroughs from Jul 2015 to May 2023

76.182.588.995.3101.7Jul 2015Jan 2017Aug 2018Mar 2020Oct 2021May 2023accuracyDate

Key Milestones

Jul 2015
CRNN

Lexicon-free. Table 2. TPAMI 2017.

78.2
Mar 2021
ABINet-LV

ABINet Language-Vision variant. CVPR 2021.

96.4
+23.3%
Nov 2021
MATRN

From paper: Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

96.6
+0.2%
Dec 2021
S-GTR

From paper: Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

97.5
+0.9%
Jul 2022
PARSeq

Lowercase alphanum eval, 3000 test samples. ECCV 2022.

99.0
+1.5%
May 2023
CLIP4STR-L (DataComp-1B)Current SOTA

From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

99.6
+0.6%
Total Improvement
27.4%
Time Span
7y 11m
Breakthroughs
6
Current SOTA
99.6

Top Models Performance Comparison

Top 10 models ranked by accuracy

accuracy1CLIP4STR-L (DataComp-1B)99.6100.0%2DTrOCR 105M99.6100.0%3CLIP4STR-L99.599.9%4CLIP4STR-B (DataComp-1B)99.599.9%5CPPD99.399.7%6CLIP4STR-B99.299.6%7PARSeq99.099.4%8MGP-STR98.899.2%9CCD-ViT-Small(ARD_2.8M)98.098.4%10CCD-ViT-Base(ARD_2.8M)98.098.4%0%25%50%75%100%% of best
Best Score
99.6
Top Model
CLIP4STR-L (DataC...
Models Compared
10
Score Range
1.6

accuracyPrimary

#ModelScorePaper / CodeDate
1
CLIP4STR-L (DataComp-1B)
99.6May 2023
2
DTrOCR 105M
99.6Aug 2023
3
CLIP4STR-L
99.5May 2023
4
CLIP4STR-B (DataComp-1B)
99.5May 2023
5
CPPD
99.3Jul 2023
6
CLIP4STR-B
Research
99.2May 2023
7
PARSeqOpen Source
Research
99Jul 2022
8
MGP-STR
98.8Sep 2022
9
CCD-ViT-Small(ARD_2.8M)
98Nov 2022
10
CCD-ViT-Base(ARD_2.8M)
98Nov 2022
11
S-GTR
97.5Dec 2021
12
DiffusionSTR
97.3Jun 2023
13
CCD-ViT-Tiny(ARD_2.8M)
97.1Nov 2022
14
SIGA_S
96.9Mar 2022
15
MATRN
Research
96.6Nov 2021
16
CDistNet (Ours)
96.57Nov 2021
17
ABINet-LVOpen Source
Fang et al.
96.4Mar 2021
18
DPAN
96.2
Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text RecognitionCode
Aug 2021
19
TrOCR-large 558M
94.1Sep 2021
20
TrOCR-base 334M
93.4Sep 2021
21
CRNN
78.2Jul 2015

Related Papers14

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
May 2023Models: CLIP4STR-L (DataComp-1B), CLIP4STR-L, CLIP4STR-B (DataComp-1B) +1 more
Self-supervised Character-to-Character Distillation for Text Recognition
Nov 2022Models: CCD-ViT-Small(ARD_2.8M), CCD-ViT-Base(ARD_2.8M), CCD-ViT-Tiny(ARD_2.8M)

Other Scene Text Recognition Datasets