DPNet (ResNet-50, 736px).

Fang et al.open-sourceUnknown paramsResNet-50 + Channel Enhanced Self-Attention Module (CESAM) + Spatial Enhanced Self-Attention Module (SESAM)

Dual Perspective CNN-Transformer for scene text detection. Integrates CESAM and SESAM into ResNet backbone. 736×736 input. PLOS ONE 2024. DOI:10.1371/journal.pone.0309286.

§ 02 · Benchmarks

Every benchmark DPNet (ResNet-50, 736px) has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	msra-td500	Computer Vision · Scene Text Detection	f-measure	86.7%	#9/24	2024-10-15	source ↗
02	msra-td500	Computer Vision · Scene Text Detection	precision	91.4%	#10/23	2024-10-15	source ↗
03	msra-td500	Computer Vision · Scene Text Detection	recall	82.5%	#11/24	2024-10-15	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area