Scene Text Detection2023en

Union14M: A Unified Benchmark for Scene Text Recognition

Next-generation STR benchmark with 4M labeled + 10M unlabeled images. Accuracy drops 33-48% vs standard benchmarks (IIIT5K etc.), exposing real-world challenges like artistic text, multi-oriented, and occluded text.

Samples:14,000,000
Metrics:accuracy
Paper / Website
Current State of the Art

CLIP4STR-B

Research

70.8

accuracy

accuracy Progress Over Time

Showing 3 breakthroughs from Nov 2021 to May 2023

54.759.163.567.972.3Nov 2021Aug 2022May 2023accuracyDate

Key Milestones

Nov 2021
CDistNet

CDistNet on Union14M-Benchmark. 56.2% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). AAAI 2022 baseline.

56.2
Jul 2022
PARSeq

PARSeq on Union14M-Benchmark. 67.8% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). Strong ECCV 2022 baseline exposed with real-world difficulty.

67.8
+20.6%
May 2023
CLIP4STR-BCurrent SOTA

CLIP4STR-B on Union14M-Benchmark. 70.8% word accuracy. Reported in Union14M paper (arXiv 2307.08723, ICCV 2023) and CLIP4STR paper. Best model on Union14M at time of benchmark publication.

70.8
+4.4%
Total Improvement
26.0%
Time Span
1y 6m
Breakthroughs
3
Current SOTA
70.8

Top Models Performance Comparison

Top 8 models ranked by accuracy

accuracy1CLIP4STR-B70.8100.0%2PARSeq67.895.8%3CLIP4STR67.395.1%4LPV-S65.191.9%5PARSeq63.890.1%6MAERec-S62.488.1%7MATRN61.286.4%8CDistNet56.279.4%0%25%50%75%100%% of best
Best Score
70.8
Top Model
CLIP4STR-B
Models Compared
8
Score Range
14.6

accuracyPrimary

#ModelScorePaper / CodeDate
1
CLIP4STR-B
Research
70.8Mar 2026
2
PARSeqOpen Source
Research
67.8Mar 2026
3
CLIP4STROpen Source
Research
67.3Mar 2026
4
LPV-SOpen Source
Research
65.1Mar 2026
5
PARSeqOpen Source
Research
63.8Mar 2026
6
MAERec-SOpen Source
Research
62.4Mar 2026
7
MATRN
Research
61.2Mar 2026
8
CDistNetOpen Source
Research
56.2Mar 2026

Other Scene Text Detection Datasets

Union14M Benchmark - Scene Text Detection | CodeSOTA