Union14M: A Unified Benchmark for Scene Text Recognition
Next-generation STR benchmark with 4M labeled + 10M unlabeled images. Accuracy drops 33-48% vs standard benchmarks (IIIT5K etc.), exposing real-world challenges like artistic text, multi-oriented, and occluded text.
CLIP4STR-B
Research
70.8
accuracy
accuracy Progress Over Time
Showing 3 breakthroughs from Nov 2021 to May 2023
Key Milestones
CDistNet on Union14M-Benchmark. 56.2% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). AAAI 2022 baseline.
PARSeq on Union14M-Benchmark. 67.8% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). Strong ECCV 2022 baseline exposed with real-world difficulty.
CLIP4STR-B on Union14M-Benchmark. 70.8% word accuracy. Reported in Union14M paper (arXiv 2307.08723, ICCV 2023) and CLIP4STR paper. Best model on Union14M at time of benchmark publication.
Top Models Performance Comparison
Top 8 models ranked by accuracy
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | CLIP4STR-B Research | 70.8 | Mar 2026 | |
| 2 | PARSeqOpen Source Research | 67.8 | Mar 2026 | |
| 3 | CLIP4STROpen Source Research | 67.3 | Mar 2026 | |
| 4 | LPV-SOpen Source Research | 65.1 | Mar 2026 | |
| 5 | PARSeqOpen Source Research | 63.8 | Mar 2026 | |
| 6 | MAERec-SOpen Source Research | 62.4 | Mar 2026 | |
| 7 | MATRN Research | 61.2 | Mar 2026 | |
| 8 | CDistNetOpen Source Research | 56.2 | Mar 2026 |