Codesota · Benchmark · iiit5kHome/Leaderboards/Vision & Documents/Scene Text Recognition/iiit5k
Unknown

iiit5k.

iiit5k is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for iiit5k.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for iiit5k. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01CLIP4STR-L (DataComp-1B)
From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
verified99.62023Paper ↗Code ↗Looks wrong?
02DTrOCR 105M
From paper: DTrOCR: Decoder-only Transformer for Optical Character Recognition
verified99.62023Paper ↗Code ↗Looks wrong?
03CLIP4STR-B (DataComp-1B)
From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
verified99.52023Paper ↗Code ↗Looks wrong?
04CLIP4STR-L
From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
verified99.52023Paper ↗Code ↗Looks wrong?
05CPPD
From paper: Context Perception Parallel Decoder for Scene Text Recognition
verified99.32023Paper ↗Code ↗Looks wrong?
06CLIP4STR-B
From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
verified99.22023Paper ↗Code ↗Looks wrong?
07PARSeq
Lowercase alphanum eval, 3000 test samples. ECCV 2022.
verified992022Paper ↗Looks wrong?
08MGP-STR
From paper: Multi-Granularity Prediction for Scene Text Recognition
verified98.82022Paper ↗Code ↗Looks wrong?
09CCD-ViT-Small(ARD_2.8M)
From paper: Self-supervised Character-to-Character Distillation for Text Recognition
verified982022Paper ↗Code ↗Looks wrong?
10CCD-ViT-Base(ARD_2.8M)
From paper: Self-supervised Character-to-Character Distillation for Text Recognition
verified982022Paper ↗Code ↗Looks wrong?
11S-GTR
From paper: Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition
verified97.52021Paper ↗Code ↗Looks wrong?
12DiffusionSTR
From paper: DiffusionSTR: Diffusion Model for Scene Text Recognition
verified97.32023Paper ↗Looks wrong?
13CCD-ViT-Tiny(ARD_2.8M)
From paper: Self-supervised Character-to-Character Distillation for Text Recognition
verified97.12022Paper ↗Code ↗Looks wrong?
14SIGA_S
From paper: Self-supervised Implicit Glyph Attention for Text Recognition
verified96.92022Paper ↗Code ↗Looks wrong?
15MATRN
From paper: Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features
verified96.62021Paper ↗Code ↗Looks wrong?
16CDistNet (Ours)
From paper: CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
verified96.572021Paper ↗Code ↗Looks wrong?
17ABINet-LV
ABINet Language-Vision variant. CVPR 2021.
verified96.42021Paper ↗Looks wrong?
18DPAN
From paper: Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition
verified96.22021Paper ↗Code ↗Looks wrong?
19TrOCR-large 558M
TrOCR-large, Syn+Benchmark training. Table 6. AAAI 2023.
verified94.12021Paper ↗Looks wrong?
20TrOCR-base 334M
TrOCR-base, Syn+Benchmark training. Table 6. AAAI 2023.
verified93.42021Paper ↗Looks wrong?
21CRNN
Lexicon-free. Table 2. TPAMI 2017.
verified78.22015Paper ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Scene Text Recognition