Who leads the iiit5k benchmark?

CLIP4STR-L (DataComp-1B) currently leads iiit5k with a score of 99.6 on Accuracy.

What is the state-of-the-art score on iiit5k?

The state-of-the-art result on iiit5k is 99.6 (Accuracy), achieved by CLIP4STR-L (DataComp-1B) as of 2023.

How many models are tracked on iiit5k?

Codesota tracks 21 models on iiit5k.

When was the iiit5k leaderboard last updated?

The iiit5k leaderboard on Codesota includes results through 2023, with the earliest tracked result from 2015.

Codesota · Benchmark · iiit5kHome/Leaderboards/Vision & Documents/Scene Text Recognition/iiit5k

Unknown

iiit5k.

Name: iiit5k Benchmark Results
Creator: Unknown
Published: 2015-01-01
License: https://creativecommons.org/licenses/by/4.0/

iiit5k is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for iiit5k.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Accuracy

Accuracy is the reported evaluation metric for iiit5k. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	CLIP4STR-L (DataComp-1B) From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model	verified	99.6	2023	Paper ↗Code ↗	Looks wrong?
02	DTrOCR 105M From paper: DTrOCR: Decoder-only Transformer for Optical Character Recognition	verified	99.6	2023	Paper ↗Code ↗	Looks wrong?
03	CLIP4STR-B (DataComp-1B) From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model	verified	99.5	2023	Paper ↗Code ↗	Looks wrong?
04	CLIP4STR-L From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model	verified	99.5	2023	Paper ↗Code ↗	Looks wrong?
05	CPPD From paper: Context Perception Parallel Decoder for Scene Text Recognition	verified	99.3	2023	Paper ↗Code ↗	Looks wrong?
06	CLIP4STR-B From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model	verified	99.2	2023	Paper ↗Code ↗	Looks wrong?
07	PARSeq Lowercase alphanum eval, 3000 test samples. ECCV 2022.	verified	99	2022	Paper ↗	Looks wrong?
08	MGP-STR From paper: Multi-Granularity Prediction for Scene Text Recognition	verified	98.8	2022	Paper ↗Code ↗	Looks wrong?
09	CCD-ViT-Small(ARD_2.8M) From paper: Self-supervised Character-to-Character Distillation for Text Recognition	verified	98	2022	Paper ↗Code ↗	Looks wrong?
10	CCD-ViT-Base(ARD_2.8M) From paper: Self-supervised Character-to-Character Distillation for Text Recognition	verified	98	2022	Paper ↗Code ↗	Looks wrong?
11	S-GTR From paper: Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition	verified	97.5	2021	Paper ↗Code ↗	Looks wrong?
12	DiffusionSTR From paper: DiffusionSTR: Diffusion Model for Scene Text Recognition	verified	97.3	2023	Paper ↗	Looks wrong?
13	CCD-ViT-Tiny(ARD_2.8M) From paper: Self-supervised Character-to-Character Distillation for Text Recognition	verified	97.1	2022	Paper ↗Code ↗	Looks wrong?
14	SIGA_S From paper: Self-supervised Implicit Glyph Attention for Text Recognition	verified	96.9	2022	Paper ↗Code ↗	Looks wrong?
15	MATRN From paper: Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features	verified	96.6	2021	Paper ↗Code ↗	Looks wrong?
16	CDistNet (Ours) From paper: CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition	verified	96.57	2021	Paper ↗Code ↗	Looks wrong?
17	ABINet-LV ABINet Language-Vision variant. CVPR 2021.	verified	96.4	2021	Paper ↗	Looks wrong?
18	DPAN From paper: Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition	verified	96.2	2021	Paper ↗Code ↗	Looks wrong?
19	TrOCR-large 558M TrOCR-large, Syn+Benchmark training. Table 6. AAAI 2023.	verified	94.1	2021	Paper ↗	Looks wrong?
20	TrOCR-base 334M TrOCR-base, Syn+Benchmark training. Table 6. AAAI 2023.	verified	93.4	2021	Paper ↗	Looks wrong?
21	CRNN Lexicon-free. Table 2. TPAMI 2017.	verified	78.2	2015	Paper ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Scene Text Recognition