Codesota · OCR · Benchmarks · ocrbenchHome/OCR/Benchmarks/ocrbench
Unknown

ocrbench.

OCR benchmark

§ 01 · score

score.

Higher is better

#ModelScoreSource
qwen3-5-397b-a17b
Official Qwen3.5 blog (https://qwen.ai/blog?id=qwen3.5). Vision table row OCRBench; linked to OCR task using existing Score metric.; PWC evaluation id 1060; paper: Qwen3.5: Towards Native Multimodal Agents
931paperswithcode-public-api
2
Kimi K2.5
OCRBench overall score on the native 0-1000 scale (card reports 92.3 normalized); max 64k tokens; avg@3; Thinking mode.; PWC evaluation id 1252; paper: Kimi K2.5: Visual Agentic Intelligence
923paperswithcode-public-api
3
qwen3-vl-235b-a22b-instruct
Table 2 of Qwen3-VL technical report (arXiv:2511.21631), OCRBench (rescaled from 87.5/92.0 to the 0-1000 scale used by the existing rows).; PWC evaluation id 4749; paper: Qwen3-VL Technical Report
920paperswithcode-public-api
4
sensenova-u1-a3b-mot
PWC OCRBench Score normalized to the 0-1000 convention.; Paper Table 3; SenseNova-U1-A3B-MoT Think mode on OCRBench.; PWC evaluation id 5612; paper: SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
919paperswithcode-public-api
5
qwen3-5-omni-plus
PWC OCRBench Score normalized to the 0-1000 convention.; Paper Table 6, Vision->Text, OCRBench document understanding benchmark.; PWC evaluation id 5329; paper: Qwen3.5-Omni Technical Report
913paperswithcode-public-api
6
internvl3-78b
PWC evaluation id 768; paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
906paperswithcode-public-api
7
qwen3-6-35b-a3b
PWC OCRBench Score normalized to the 0-1000 convention.; OCRBench row from Qwen3.6 model-card Vision benchmark table; imported as configured OCRBench Score. Source: Qwen3.6-27B Hugging Face model card benchmark table (https://huggingface.co/Qwen/Qwen3.6-27B).; PWC evaluation id 5562; paper: Qwen3.6
900paperswithcode-public-api
8
qwen3-vl-8b-instruct
Reported on OCRBench (raw 0-1000 score) by Qwen3-VL-8B-Instruct model card.; PWC evaluation id 1279; paper: Qwen3-VL Technical Report
896paperswithcode-public-api
9
qwen3-6-27b
PWC OCRBench Score normalized to the 0-1000 convention.; OCRBench row from Qwen3.6 model-card Vision benchmark table; imported as configured OCRBench Score. Source: Qwen3.6-27B Hugging Face model card benchmark table (https://huggingface.co/Qwen/Qwen3.6-27B).; PWC evaluation id 5563; paper: Qwen3.6
894paperswithcode-public-api
10
Qwen2.5-VL 72B
Table 5, OCRBench. Source: Qwen2.5-VL Technical Report (arXiv:2502.13923). Model: Qwen2.5-VL-72B.; PWC evaluation id 5023; paper: Qwen2.5-VL Technical Report
885paperswithcode-public-api
11
Qianfan-OCR
OCRBench standard Score (0-1000); 880.; PWC evaluation id 1197; paper: Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
880paperswithcode-public-api
12
ovis2-5-9b
PWC OCRBench Score normalized to the 0-1000 convention.; Table 3, OpenCompass suite; OCRBench (OCR). Source/provenance: Ovis2.5 Technical Report; source arXiv paper https://arxiv.org/abs/2508.11737; official HF model URL https://huggingface.co/AIDC-AI/Ovis2.5-9B.; PWC evaluation id 5583; paper: Ovis2.5 Technical Report
879paperswithcode-public-api
13
Qwen2-VL 72B
PWC evaluation id 143; paper: Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
877paperswithcode-public-api
14
minicpm-o-4-5-instruct
Instruct mode from the openbmb/MiniCPM-o-4_5 Hugging Face model card (https://huggingface.co/openbmb/MiniCPM-o-4_5); 9B params; results reported in instruct mode/variant; from the 'Image Understanding (Instruct)' table; metric label in card: OCRBench; 0-1000 scale used by other rows on this leaderboard.; PWC evaluation id 1171; paper: MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
876paperswithcode-public-api
15
qwen3-vl-235b-a22b-thinking
Table 2 of Qwen3-VL technical report (arXiv:2511.21631), OCRBench (rescaled from 87.5/92.0 to the 0-1000 scale used by the existing rows).; PWC evaluation id 4748; paper: Qwen3-VL Technical Report
875paperswithcode-public-api
16
kimi-vl-a3b-thinking-2506
Kimi-VL-A3B-Thinking-2506 on OCRBench overall score (raw 0-1000 scale) from the moonshotai/Kimi-VL-A3B-Thinking-2506 HF model card.; PWC evaluation id 3371; paper: Kimi-VL Technical Report
869paperswithcode-public-api
17
kimi-vl-a3b-instruct
Kimi-VL-A3B-Instruct on OCRBench overall score (raw 0-1000 scale) from Kimi-VL Technical Report Table 3 and the moonshotai/Kimi-VL-A3B-Instruct HF model card.; PWC evaluation id 3351; paper: Kimi-VL Technical Report
867paperswithcode-public-api
18
qwen2-vl-7b
PWC evaluation id 144; paper: Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
866paperswithcode-public-api
19
minimax-vl-01
PWC evaluation id 886; paper: MiniMax-01: Scaling Foundation Models with Lightning Attention
865paperswithcode-public-api
20
Qwen2.5-VL-7B
paper table; source label OCRBench; metric reported as Score. Imported while expanding from ScreenSpot-Pro source papers.; PWC evaluation id 5377; paper: Qwen2.5-VL Technical Report
864paperswithcode-public-api
21
infinity-parser2-pro
OCRBench full benchmark score from the Infinity-Parser2-Pro Hugging Face card / GitHub performance table. Source reports 86.20 on a 0-100 scale; stored as 862.0 on the 0-1000 OCRBench Score convention used by existing rows.; PWC evaluation id 4966; paper: Infinity-Parser2-Pro
862paperswithcode-public-api
22
dots.mocr
OCRBench overall score from the dots.mocr Hugging Face model card (section 3, General Vision Tasks). The card reports 86.0 on a 0-100 scale; converted to 860 on the standard 0-1000 OCRBench scale used by other rows on this leaderboard (consistent with Qwen3-VL-2B = 85.8 -> 858 in the same table).; PWC evaluation id 1153; paper: Multimodal OCR: Parse Anything from Documents
860paperswithcode-public-api
23
hunyuanocr-1b
PWC evaluation id 957; paper: HunyuanOCR Technical Report
860paperswithcode-public-api
24
minicpm-v-4-6-thinking-16x
Thinking mode from the MiniCPM-V 4.6 Hugging Face model card; official checkpoint; visual token compression ratio 16x; metric label in card: OCRBench.; PWC evaluation id 1115; paper: A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
831paperswithcode-public-api
25
videollama3-7b
DAMO-NLP-SG/VideoLLaMA3-7B-Image checkpoint; numbers from the 7B-Image model card main-results table.; PWC evaluation id 1214; paper: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
828paperswithcode-public-api
26
qwen2-vl-2b
PWC evaluation id 145; paper: Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
809paperswithcode-public-api
27
zaya1-vl-8b
OCRBench overall score (0-1000 scale; card reports 79.8 normalised). Reported in the ZAYA1-VL-8B technical report (Zyphra). Evaluated on the Zyphra eval harness based on VLMEvalKit.; PWC evaluation id 1230; paper: ZAYA1-VL-8B Technical Report
798paperswithcode-public-api
28
qwen2-5-vl-3b
paper table; source label OCRBench; metric reported as Score. Imported while expanding from ScreenSpot-Pro source papers.; PWC evaluation id 5378; paper: Qwen2.5-VL Technical Report
797paperswithcode-public-api
29
videollama3-2b
PWC evaluation id 115; paper: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
779paperswithcode-public-api
30
minicpm-llama3-v-2-5
Paper Table 5 OCR benchmark result for MiniCPM-Llama3-V 2.5; source reports OCRBench score.; PWC evaluation id 5183; paper: MiniCPM-V: A GPT-4V Level MLLM on Your Phone
725paperswithcode-public-api
§ Related · Explore

More OCR content.

Verified Model Reviews
Comparisons & Guides
View all OCR benchmarks → Back to All Benchmarks