Codesota · Tasks · Scene Text RecognitionHome/Tasks/Computer Vision/Scene Text Recognition

Scene Text Recognition.

Recognizing text in natural scene images

Datasets

127

Results

accuracy

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

cute80

Dataset from Papers With Code

Primary metric: accuracy

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on cute80.

#	Model	accuracy	Year	Source
★	CPPD✓	99.7	2023	paper ↗
2	CLIP4STR-L (DataComp-1B)✓	99.7	2023	paper ↗
3	MGP-STR✓	99.3	2022	paper ↗
4	CLIP4STR-B✓	99.3	2023	paper ↗
5	DTrOCR 105M✓	99.1	2023	paper ↗
6	CLIP4STR-L✓	99.0	2023	paper ↗
7	PARSeq✓	98.6	2026	paper ↗
8	CCD-ViT-Small(ARD_2.8M)✓	98.3	2022	paper ↗
9	CCD-ViT-Base(ARD_2.8M)✓	98.3	2022	paper ↗
10	CCD-ViT-Tiny(ARD_2.8M)✓	95.8	2022	paper ↗

What were you looking for on Scene Text Recognition?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

11 datasets tracked for this task.

cute80

CANONICAL

20 results · accuracy

Top: CPPD — 99.7

svt

40 results · accuracy

Top: CLIP4STR-H (DFN-5B) — 99.1

iiit5k

21 results · accuracy

Top: CLIP4STR-L (DataComp-1B) — 99.6

svtp

19 results · accuracy

Top: DTrOCR 105M — 98.6

icdar-2003

12 results · accuracy

Top: Yet Another Text Recognizer — 97.1

wost

5 results · accuracy

Top: CLIP4STR-H (DFN-5B) — 90.9

host

3 results · accuracy

Top: CLIP4STR-L — 82.7

uber-text

3 results · accuracy

Top: CLIP4STR-L (DataComp-1B) — 92.2

msda

2 results · accuracy

Top: MetaSelf-Learning — 42.0

ic13

1 result · accuracy

Top: ABINet-LV+TPS++ — 97.8

svt-p

1 result · accuracy

Top: ABINet-LV+TPS++ — 89.6

§ 05 · Related tasks

Other tasks in Computer Vision.

3D Understanding Depth estimation Document Image Classification Document Layout Analysis Document Parsing Document Understanding General OCR Capabilities Handwriting Recognition

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Scene Text Recognition? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.