Union14M
Unknown
Next-generation scene text recognition benchmark assembled from 14 datasets (4M labeled + 10M unlabeled images). Accuracy drops 33-48% vs standard benchmarks, exposing real-world model limitations across 7 challenge categories: Artistic, Multi-Oriented, Salient, Multi-Words, General, Contextless, Incomplete.
Benchmark Stats
SOTA History
accuracy
accuracy
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | CLIP4STR-B CLIP4STR-B on Union14M-Benchmark. 70.8% word accuracy. Reported in Union14M paper (arXiv 2307.08723, ICCV 2023) and CLIP4STR paper. Best model on Union14M at time of benchmark publication. | Community | 70.8 | 2026 | Source |
| 2 | PARSeq PARSeq on Union14M-Benchmark. 67.8% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). Strong ECCV 2022 baseline exposed with real-world difficulty. | Community | 67.8 | 2026 | Source |
| 3 | LPV-S LPV-S (Language-Guided Progressive Vison, Small) on Union14M-Benchmark. 65.1% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). | Community | 65.1 | 2026 | Source |
| 4 | MAERec-S MAERec-S on Union14M-Benchmark. 62.4% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). MAE pre-training for text recognition. | Community | 62.4 | 2026 | Source |
| 5 | CDistNet CDistNet on Union14M-Benchmark. 56.2% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). AAAI 2022 baseline. | Community | 56.2 | 2026 | Source |