Who leads the svtp benchmark?

DTrOCR 105M currently leads svtp with a score of 98.60 on accuracy.

What is the state-of-the-art score on svtp?

The state-of-the-art result on svtp is 98.60 (accuracy), achieved by DTrOCR 105M as of 2023.

How many models are tracked on svtp?

Codesota tracks 19 models on svtp.

When was the svtp leaderboard last updated?

The svtp leaderboard on Codesota includes results through 2023, with the earliest tracked result from 2021.

Codesota · Computer Vision · Scene Text Recognition · svtpTasks/Computer Vision/Scene Text Recognition

Scene Text Recognition · benchmark dataset · 2020 · EN

svtp.

Name: svtp Benchmark Results
Creator: Codesota
Published: 2021-01-01
License: https://creativecommons.org/licenses/by/4.0/

Dataset from Papers With Code

Submit a result ↵

§ 01 · Leaderboard

Best published scores.

19 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.

Primary: accuracy · higher is better

accuracy· primary

19 rows

#	Model	Org	Submitted	Paper / code	accuracy
01	DTrOCR 105M	—	Aug 2023	DTrOCR: Decoder-only Transformer for Optical Character R… · code	98.60
02	MGP-STR	—	Sep 2022	Multi-Granularity Prediction for Scene Text Recognition · code	98.30
03	CLIP4STR-L (DataComp-1B)	—	May 2023	CLIP4STR: A Simple Baseline for Scene Text Recognition w… · code	98.10
04	CLIP4STR-L	—	May 2023	CLIP4STR: A Simple Baseline for Scene Text Recognition w… · code	97.40
05	CLIP4STR-B	Research	May 2023	CLIP4STR: A Simple Baseline for Scene Text Recognition w… · code	97.20
06	PARSeqOpen	Research	Jul 2022	Scene Text Recognition with Permuted Autoregressive Sequ…	96.90
07	CPPD	—	Jul 2023	Context Perception Parallel Decoder for Scene Text Recog… · code	96.70
08	CCD-ViT-Base	—	Nov 2022	Self-supervised Character-to-Character Distillation for … · code	96.10
09	CCD-ViT-Small	—	Nov 2022	Self-supervised Character-to-Character Distillation for … · code	92.70
10	CCD-ViT-Tiny	—	Nov 2022	Self-supervised Character-to-Character Distillation for … · code	91.60
11	MATRN	Research	Nov 2021	Multi-modal Text Recognition Networks: Interactive Enhan… · code	90.60
12	S-GTR	—	Dec 2021	Visual Semantics Allow for Textual Reasoning Better in S… · code	90.60
13	SIGA_T	—	Mar 2022	Self-supervised Implicit Glyph Attention for Text Recogn… · code	90.50
14	CDistNet (Ours)	—	Nov 2021	CDistNet: Perceiving Multi-Domain Character Distance for… · code	89.77
15	ABINet-LVOpen	Fang et al.	Mar 2021	Read Like Humans: Autonomous, Bidirectional and Iterativ…	89.50
16	DiffusionSTR	—	Jun 2023	DiffusionSTR: Diffusion Model for Scene Text Recognition	89.20
17	DPAN	—	Aug 2021	papers-with-code · code	89
18	TrOCR-large 558M	—	Sep 2021	TrOCR: Transformer-based Optical Character Recognition w…	88.10
19	TrOCR-base 334M	—	Sep 2021	TrOCR: Transformer-based Optical Character Recognition w…	86.90

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

6 steps
of state of the art.

Each row below marks a model that broke the previous record on accuracy. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · accuracy

Mar 6, 2021ABINet-LVFang et al.89.50
Nov 22, 2021CDistNet (Ours)89.77
Nov 30, 2021MATRNResearch90.60
Jul 14, 2022PARSeqResearch96.90
Sep 8, 2022MGP-STR98.30
Aug 30, 2023DTrOCR 105M98.60

Fig 3 · SOTA-setting models only. 6 entries span Mar 2021 → Aug 2023.

§ 04 · Literature

13 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

DTrOCR: Decoder-only Transformer for Optical Character Recognition
Aug 2023·DTrOCR 105M
arXiv ↗Code
Context Perception Parallel Decoder for Scene Text Recognition
Jul 2023·CPPD
arXiv ↗Code
DiffusionSTR: Diffusion Model for Scene Text Recognition
Jun 2023·DiffusionSTR
arXiv ↗
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
May 2023·CLIP4STR-L (DataComp-1B), CLIP4STR-L, CLIP4STR-B
arXiv ↗Code
Self-supervised Character-to-Character Distillation for Text Recognition
Nov 2022·CCD-ViT-Base, CCD-ViT-Small, CCD-ViT-Tiny
arXiv ↗Code
Multi-Granularity Prediction for Scene Text Recognition
Sep 2022·MGP-STR
arXiv ↗Code
Scene Text Recognition with Permuted Autoregressive Sequence Models
Jul 2022·PARSeq
arXiv ↗
Self-supervised Implicit Glyph Attention for Text Recognition
Mar 2022·SIGA_T
arXiv ↗Code
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition
Dec 2021·S-GTR
arXiv ↗Code
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features
Nov 2021·MATRN
arXiv ↗Code
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
Nov 2021·CDistNet (Ours)
arXiv ↗Code
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Sep 2021·TrOCR-large 558M, TrOCR-base 334M
arXiv ↗
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Mar 2021·ABINet-LV
arXiv ↗

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

svtp.

Best published scores.

6 stepsof state of the art.

13 paperstied to this benchmark.

Neighbouring benchmarks.

Have a score that beatsthis table?

6 steps
of state of the art.

13 papers
tied to this benchmark.

Have a score that beats
this table?