Scene Text Recognition
Recognizing text in natural scene images
Scene Text Recognition is a key task in computer vision. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.
Benchmarks & SOTA
svt
Dataset from Papers With Code
State of the Art
CLIP4STR-H (DFN-5B)
99.1
accuracy
cute80
Dataset from Papers With Code
State of the Art
CPPD
99.7
accuracy
iiit5k
Dataset from Papers With Code
State of the Art
DTrOCR 105M
99.6
accuracy
svtp
Dataset from Papers With Code
State of the Art
DTrOCR 105M
98.6
accuracy
icdar-2003
Dataset from Papers With Code
State of the Art
Yet Another Text Recognizer
97.1
accuracy
wost
Dataset from Papers With Code
State of the Art
CLIP4STR-H (DFN-5B)
90.9
1-1-accuracy
uber-text
Dataset from Papers With Code
State of the Art
CLIP4STR-L (DataComp-1B)
92.2
accuracy
host
Dataset from Papers With Code
State of the Art
CLIP4STR-L
82.7
1-1-accuracy
msda
Dataset from Papers With Code
State of the Art
MetaSelf-Learning
42
accuracy
svt-p
Dataset from Papers With Code
State of the Art
ABINet-LV+TPS++
89.6
accuracy
ic13
Dataset from Papers With Code
State of the Art
ABINet-LV+TPS++
97.8
accuracy
Related Tasks
General OCR Capabilities
Comprehensive benchmarks covering multiple aspects of OCR performance.
Polish OCR
OCR for Polish language including historical documents, gothic fonts, and diacritic recognition.
Image Classification
Categorizing images into predefined classes (ImageNet, CIFAR).
Object Detection
Locating and classifying objects in images (COCO, Pascal VOC).