Who leads the coco-text benchmark?

CLIP4STR-L currently leads coco-text with a score of 81.9 on 1 1 Accuracy.

What is the state-of-the-art score on coco-text?

The state-of-the-art result on coco-text is 81.9 (1 1 Accuracy), achieved by CLIP4STR-L as of 2026.

How many models are tracked on coco-text?

Codesota tracks 21 models on coco-text across 4 metrics.

When was the coco-text leaderboard last updated?

The coco-text leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2016.

Codesota · Benchmark · coco-textHome/Leaderboards/Vision & Documents/Scene Text Detection/coco-text

Unknown

coco-text.

Name: coco-text Benchmark Results
Creator: Unknown
Published: 2016-01-01
License: https://creativecommons.org/licenses/by/4.0/

coco-text is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for coco-text.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

1 1 Accuracy

1 1 Accuracy is the reported evaluation metric for coco-text. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for 1 1 Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	CLIP4STR-L From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model	verified	81.9	2023	Paper ↗Code ↗	Looks wrong?
02	MGP-STR From paper: Multi-Granularity Prediction for Scene Text Recognition	verified	81.7	2022	Paper ↗Code ↗	Looks wrong?
03	CLIP4STR-B From paper: CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model	verified	81.1	2023	Paper ↗Code ↗	Looks wrong?

F Measure

F Measure is the reported evaluation metric for coco-text. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for F Measureverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	TCM CLIP-based detector with joint-dataset training. IJCAI 2025.	paper	65.9	2026	Source ↗	Looks wrong?
02	PANet (Joint) Pixel Aggregation Network with joint-dataset fine-tuning.	unverified	64.5	2026	Source ↗	Looks wrong?
03	LRANet Low-Rank Approximation Network. AAAI 2024 Oral.	unverified	61.7	2026	Source ↗	Looks wrong?
04	DPText-DETR DETR-based with dynamic point queries. Joint training. AAAI 2023.	unverified	61.6	2026	Source ↗	Looks wrong?
05	MAEDet MAE-based self-supervised pretraining for text detection. IJCAI 2025.	unverified	60.6	2026	Source ↗	Looks wrong?
06	DBNet Differentiable Binarization with fine-tuning. AAAI 2020.	unverified	60.5	2026	Source ↗	Looks wrong?
07	DBNet++ DB with Adaptive Scale Fusion. Joint training. TPAMI 2022.	paper	59.5	2026	Source ↗	Looks wrong?
08	SRFormer Segmentation+Regression Transformer. AAAI 2024.	unverified	59.4	2026	Source ↗	Looks wrong?
09	Corner-based Region Proposals From paper: Detecting Multi-Oriented Text with Corner-based Region Proposals	verified	59.1	2018	Paper ↗Code ↗	Looks wrong?
10	TextBoxes++_MS From paper: TextBoxes++: A Single-Shot Oriented Scene Text Detector	verified	58.72	2018	Paper ↗Code ↗	Looks wrong?
11	FCENet Fourier Contour Embedding. CVPR 2021.	paper	57.9	2026	Source ↗	Looks wrong?
12	PSENet Progressive Scale Expansion Network. CVPR 2019.	paper	56	2026	Source ↗	Looks wrong?
13	ABCNet v2 Adaptive Bezier-Curve Network v2. TPAMI 2021.	paper	53.2	2026	Source ↗	Looks wrong?
14	EAST + VGG16 From paper: EAST: An Efficient and Accurate Scene Text Detector	verified	39.45	2017	Paper ↗Code ↗	Looks wrong?
15	SSTD From paper: Single Shot Text Detector with Regional Attention	verified	37	2017	Paper ↗Code ↗	Looks wrong?
16	WordSup (VGG16-synth-coco) From paper: WordSup: Exploiting Word Annotations for Character based Text Detection	verified	36.8	2017	Paper ↗	Looks wrong?
17	Yao et al. From paper: Scene Text Detection via Holistic, Multi-Channel Prediction	verified	33.31	2016	Paper ↗	Looks wrong?
18	DRRG Deep Relational Reasoning Graph. CVPR 2020.	unverified	31.9	2026	Source ↗	Looks wrong?

Recall

Recall is the reported evaluation metric for coco-text. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Recallverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Corner-based Region Proposals From paper: Detecting Multi-Oriented Text with Corner-based Region Proposals	verified	63.3	2018	Paper ↗Code ↗	Looks wrong?
02	TextBoxes++_MS From paper: TextBoxes++: A Single-Shot Oriented Scene Text Detector	verified	56.7	2018	Paper ↗Code ↗	Looks wrong?
03	EAST + VGG16 From paper: EAST: An Efficient and Accurate Scene Text Detector	verified	32.4	2017	Paper ↗Code ↗	Looks wrong?
04	SSTD From paper: Single Shot Text Detector with Regional Attention	verified	31	2017	Paper ↗Code ↗	Looks wrong?
05	WordSup (VGG16-synth-coco) From paper: WordSup: Exploiting Word Annotations for Character based Text Detection	verified	30.9	2017	Paper ↗	Looks wrong?
06	Yao et al. From paper: Scene Text Detection via Holistic, Multi-Channel Prediction	verified	27.1	2016	Paper ↗	Looks wrong?

Precision

Precision is the reported evaluation metric for coco-text. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Precisionverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	TextBoxes++_MS From paper: TextBoxes++: A Single-Shot Oriented Scene Text Detector	verified	60.87	2018	Paper ↗Code ↗	Looks wrong?
02	Corner-based Region Proposals From paper: Detecting Multi-Oriented Text with Corner-based Region Proposals	verified	55.5	2018	Paper ↗Code ↗	Looks wrong?
03	EAST + VGG16 From paper: EAST: An Efficient and Accurate Scene Text Detector	verified	50.39	2017	Paper ↗Code ↗	Looks wrong?
04	SSTD From paper: Single Shot Text Detector with Regional Attention	verified	46	2017	Paper ↗Code ↗	Looks wrong?
05	WordSup (VGG16-synth-coco) From paper: WordSup: Exploiting Word Annotations for Character based Text Detection	verified	45.2	2017	Paper ↗	Looks wrong?
06	Yao et al. From paper: Scene Text Detection via Holistic, Multi-Channel Prediction	verified	43.23	2016	Paper ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Scene Text Detection