| 01 | qwen3-5-397b-a17b Official Qwen3.5 blog (https://qwen.ai/blog?id=qwen3.5). Vision table row OCRBench; linked to OCR task using existing Score metric.; PWC evaluation id 1060; paper: Qwen3.5: Towards Native Multimodal Agents | verified | 931 | 2026 | Source ↗ | Edit result |
| 02 | Kimi K2.5 OCRBench overall score on the native 0-1000 scale (card reports 92.3 normalized); max 64k tokens; avg@3; Thinking mode.; PWC evaluation id 1252; paper: Kimi K2.5: Visual Agentic Intelligence | verified | 923 | 2026 | Source ↗ | Edit result |
| 03 | qwen3-vl-235b-a22b-instruct Table 2 of Qwen3-VL technical report (arXiv:2511.21631), OCRBench (rescaled from 87.5/92.0 to the 0-1000 scale used by the existing rows).; PWC evaluation id 4749; paper: Qwen3-VL Technical Report | verified | 920 | 2026 | Source ↗ | Edit result |
| 04 | sensenova-u1-a3b-mot PWC OCRBench Score normalized to the 0-1000 convention.; Paper Table 3; SenseNova-U1-A3B-MoT Think mode on OCRBench.; PWC evaluation id 5612; paper: SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture | verified | 919 | 2026 | Source ↗ | Edit result |
| 05 | qwen3-5-omni-plus PWC OCRBench Score normalized to the 0-1000 convention.; Paper Table 6, Vision->Text, OCRBench document understanding benchmark.; PWC evaluation id 5329; paper: Qwen3.5-Omni Technical Report | verified | 913 | 2026 | Source ↗ | Edit result |
| 06 | internvl3-78b PWC evaluation id 768; paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | verified | 906 | 2026 | Source ↗ | Edit result |
| 07 | qwen3-6-35b-a3b PWC OCRBench Score normalized to the 0-1000 convention.; OCRBench row from Qwen3.6 model-card Vision benchmark table; imported as configured OCRBench Score. Source: Qwen3.6-27B Hugging Face model card benchmark table (https://huggingface.co/Qwen/Qwen3.6-27B).; PWC evaluation id 5562; paper: Qwen3.6 | verified | 900 | 2026 | Source ↗ | Edit result |
| 08 | qwen3-vl-8b-instruct Reported on OCRBench (raw 0-1000 score) by Qwen3-VL-8B-Instruct model card.; PWC evaluation id 1279; paper: Qwen3-VL Technical Report | verified | 896 | 2026 | Source ↗ | Edit result |
| 09 | qwen3-6-27b PWC OCRBench Score normalized to the 0-1000 convention.; OCRBench row from Qwen3.6 model-card Vision benchmark table; imported as configured OCRBench Score. Source: Qwen3.6-27B Hugging Face model card benchmark table (https://huggingface.co/Qwen/Qwen3.6-27B).; PWC evaluation id 5563; paper: Qwen3.6 | verified | 894 | 2026 | Source ↗ | Edit result |
| 10 | Qwen2.5-VL 72B Table 5, OCRBench. Source: Qwen2.5-VL Technical Report (arXiv:2502.13923). Model: Qwen2.5-VL-72B.; PWC evaluation id 5023; paper: Qwen2.5-VL Technical Report | verified | 885 | 2026 | Source ↗ | Edit result |
| 11 | Qianfan-OCR OCRBench standard Score (0-1000); 880.; PWC evaluation id 1197; paper: Qianfan-OCR: A Unified End-to-End Model for Document Intelligence | verified | 880 | 2026 | Source ↗ | Edit result |
| 12 | ovis2-5-9b PWC OCRBench Score normalized to the 0-1000 convention.; Table 3, OpenCompass suite; OCRBench (OCR). Source/provenance: Ovis2.5 Technical Report; source arXiv paper https://arxiv.org/abs/2508.11737; official HF model URL https://huggingface.co/AIDC-AI/Ovis2.5-9B.; PWC evaluation id 5583; paper: Ovis2.5 Technical Report | verified | 879 | 2026 | Source ↗ | Edit result |
| 13 | Qwen2-VL 72B PWC evaluation id 143; paper: Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | verified | 877 | 2026 | Source ↗ | Edit result |
| 14 | minicpm-o-4-5-instruct Instruct mode from the openbmb/MiniCPM-o-4_5 Hugging Face model card (https://huggingface.co/openbmb/MiniCPM-o-4_5); 9B params; results reported in instruct mode/variant; from the 'Image Understanding (Instruct)' table; metric label in card: OCRBench; 0-1000 scale used by other rows on this leaderboard.; PWC evaluation id 1171; paper: MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction | verified | 876 | 2026 | Source ↗ | Edit result |
| 15 | qwen3-vl-235b-a22b-thinking Table 2 of Qwen3-VL technical report (arXiv:2511.21631), OCRBench (rescaled from 87.5/92.0 to the 0-1000 scale used by the existing rows).; PWC evaluation id 4748; paper: Qwen3-VL Technical Report | verified | 875 | 2026 | Source ↗ | Edit result |
| 16 | kimi-vl-a3b-thinking-2506 Kimi-VL-A3B-Thinking-2506 on OCRBench overall score (raw 0-1000 scale) from the moonshotai/Kimi-VL-A3B-Thinking-2506 HF model card.; PWC evaluation id 3371; paper: Kimi-VL Technical Report | verified | 869 | 2026 | Source ↗ | Edit result |
| 17 | kimi-vl-a3b-instruct Kimi-VL-A3B-Instruct on OCRBench overall score (raw 0-1000 scale) from Kimi-VL Technical Report Table 3 and the moonshotai/Kimi-VL-A3B-Instruct HF model card.; PWC evaluation id 3351; paper: Kimi-VL Technical Report | verified | 867 | 2026 | Source ↗ | Edit result |
| 18 | qwen2-vl-7b PWC evaluation id 144; paper: Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | verified | 866 | 2026 | Source ↗ | Edit result |
| 19 | minimax-vl-01 PWC evaluation id 886; paper: MiniMax-01: Scaling Foundation Models with Lightning Attention | verified | 865 | 2026 | Source ↗ | Edit result |
| 20 | Qwen2.5-VL-7B paper table; source label OCRBench; metric reported as Score. Imported while expanding from ScreenSpot-Pro source papers.; PWC evaluation id 5377; paper: Qwen2.5-VL Technical Report | verified | 864 | 2026 | Source ↗ | Edit result |
| 21 | infinity-parser2-pro OCRBench full benchmark score from the Infinity-Parser2-Pro Hugging Face card / GitHub performance table. Source reports 86.20 on a 0-100 scale; stored as 862.0 on the 0-1000 OCRBench Score convention used by existing rows.; PWC evaluation id 4966; paper: Infinity-Parser2-Pro | verified | 862 | 2026 | Source ↗ | Edit result |
| 22 | dots.mocr OCRBench overall score from the dots.mocr Hugging Face model card (section 3, General Vision Tasks). The card reports 86.0 on a 0-100 scale; converted to 860 on the standard 0-1000 OCRBench scale used by other rows on this leaderboard (consistent with Qwen3-VL-2B = 85.8 -> 858 in the same table).; PWC evaluation id 1153; paper: Multimodal OCR: Parse Anything from Documents | verified | 860 | 2026 | Source ↗ | Edit result |
| 23 | hunyuanocr-1b PWC evaluation id 957; paper: HunyuanOCR Technical Report | verified | 860 | 2026 | Source ↗ | Edit result |
| 24 | minicpm-v-4-6-thinking-16x Thinking mode from the MiniCPM-V 4.6 Hugging Face model card; official checkpoint; visual token compression ratio 16x; metric label in card: OCRBench.; PWC evaluation id 1115; paper: A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone | verified | 831 | 2026 | Source ↗ | Edit result |
| 25 | videollama3-7b DAMO-NLP-SG/VideoLLaMA3-7B-Image checkpoint; numbers from the 7B-Image model card main-results table.; PWC evaluation id 1214; paper: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | verified | 828 | 2026 | Source ↗ | Edit result |
| 26 | qwen2-vl-2b PWC evaluation id 145; paper: Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | verified | 809 | 2026 | Source ↗ | Edit result |
| 27 | zaya1-vl-8b OCRBench overall score (0-1000 scale; card reports 79.8 normalised). Reported in the ZAYA1-VL-8B technical report (Zyphra). Evaluated on the Zyphra eval harness based on VLMEvalKit.; PWC evaluation id 1230; paper: ZAYA1-VL-8B Technical Report | verified | 798 | 2026 | Source ↗ | Edit result |
| 28 | qwen2-5-vl-3b paper table; source label OCRBench; metric reported as Score. Imported while expanding from ScreenSpot-Pro source papers.; PWC evaluation id 5378; paper: Qwen2.5-VL Technical Report | verified | 797 | 2026 | Source ↗ | Edit result |
| 29 | videollama3-2b PWC evaluation id 115; paper: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | verified | 779 | 2026 | Source ↗ | Edit result |
| 30 | minicpm-llama3-v-2-5 Paper Table 5 OCR benchmark result for MiniCPM-Llama3-V 2.5; source reports OCRBench score.; PWC evaluation id 5183; paper: MiniCPM-V: A GPT-4V Level MLLM on Your Phone | verified | 725 | 2026 | Source ↗ | Edit result |