General OCR Capabilities
Comprehensive benchmarks covering multiple aspects of OCR performance.
General OCR Capabilities is a key task in computer vision. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.
Benchmarks & SOTA
OCRBench v2
OCRBench v2
Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.
State of the Art
Seed1.6-vision
ByteDance
62.2
overall-en-private
CC-OCR
Comprehensive Challenge OCR
Multi-scene text reading, key information extraction, multilingual text, and document parsing benchmark.
State of the Art
Gemini 1.5 Pro
83.25
multi-scene-f1
MME-VideoOCR
MME Video OCR Benchmark
1,464 videos with 2,000 QA pairs across 25 tasks. Tests OCR capabilities in video content.
State of the Art
Gemini 2.5 Pro
73.7
total-accuracy
reVISION
reVISION Polish Vision-Language Benchmark
Polish benchmark for vision-language models including OCR evaluation on educational exam materials. Covers middle school, high school, and professional exams.
No results tracked yet
Related Tasks
Polish OCR
OCR for Polish language including historical documents, gothic fonts, and diacritic recognition.
Image Classification
Categorizing images into predefined classes (ImageNet, CIFAR).
Object Detection
Locating and classifying objects in images (COCO, Pascal VOC).
Semantic Segmentation
Pixel-level classification of images (Cityscapes, ADE20K).