OCRBench v2
South China University of Technology
Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.
Benchmark Stats
SOTA History
Overall (Chinese)
Average score on Chinese private test set
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Qwen2.5-VL-72B From Qwen2.5-VL-72B-Instruct model card benchmark table. | Community | 63.7 | 2025 | Source |
| 2 | gemini-25-pro Chinese, Private split. #1 on Chinese | Editorial | 62.2 | 2025 | Source |
| 3 | Qianfan-OCR Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought. #1 on zh | Editorial | 60.77 | 2025 | Source |
| 4 | minicpm-v-4.5-8b Chinese, Private split. #4 overall | Editorial | 58.8 | 2025 | Source |
| 5 | sail-vl2-8b | Editorial | 57.6 | 2025 | Source |
| 6 | claude-3.5-sonnet | Editorial | 48.4 | 2024 | Source |
| 7 | InternVL2.5-78B From Qwen2.5-VL-72B-Instruct model card comparison table. | Community | 46.2 | 2025 | Source |
| 8 | Qwen2-VL-72B From Qwen2.5-VL-72B-Instruct model card comparison table. | Community | 46.1 | 2024 | Source |
| 9 | gpt-4o-2024 | Editorial | 45.7 | 2024 | Source |
Overall (English)
Average score on English private test set
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | seed-1.6-vision English, Private split. #1 on OCRBench v2 | Editorial | 62.2 | 2025 | Source |
| 2 | Qwen2.5-VL-72B From Qwen2.5-VL-72B-Instruct model card benchmark table. HF: Qwen/Qwen2.5-VL-72B-Instruct. | Community | 61.5 | 2025 | Source |
| 3 | qwen3-omni-30b | Editorial | 61.3 | 2025 | Source |
| 4 | nemotron-nano-v2-vl | Editorial | 61.2 | 2025 | Source |
| 5 | gemini-25-pro | Editorial | 59.3 | 2025 | Source |
| 6 | llama-3.1-nemotron-nano-vl-8b | Editorial | 56.4 | 2025 | Source |
| 7 | Qianfan-OCR Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought. | Editorial | 56 | 2025 | Source |
| 8 | gpt-4o Listed as GPT5-2025-08-07 on leaderboard | Editorial | 55.5 | 2024 | Source |
| 9 | ovis2.5-8b | Editorial | 54.1 | 2025 | Source |
| 10 | gemini-1.5-pro | Editorial | 51.6 | 2024 | Source |
| 11 | sail-vl2-8b | Editorial | 49.3 | 2025 | Source |
| 12 | minicpm-v-4.5-8b | Editorial | 48.4 | 2025 | Source |
| 13 | Qwen2-VL-72B From Qwen2.5-VL-72B-Instruct model card comparison table. | Community | 47.8 | 2024 | Source |
| 14 | gpt-4o-2024 GPT-4o baseline (not GPT5-2025-08-07) | Editorial | 47.6 | 2024 | Source |
| 15 | claude-3.5-sonnet | Editorial | 47.5 | 2024 | Source |
| 16 | internvl3.5-14b | Editorial | 47.1 | 2025 | Source |
| 17 | step-1v | Editorial | 46.8 | 2024 | Source |
| 18 | InternVL2.5-78B From Qwen2.5-VL-72B-Instruct model card comparison table. | Community | 45 | 2025 | Source |
| 19 | grok4 | Editorial | 45 | 2025 | Source |
| 20 | gpt-4o-mini | Editorial | 44.1 | 2024 | Source |
| 21 | claude-sonnet-4 Claude-sonnet-4-20250514 | Editorial | 42.4 | 2025 | Source |
| 22 | qwen2.5-vl-7b | Editorial | 41.8 | 2025 | Source |
| 23 | deepseek-vl2-small | Editorial | 41 | 2024 | Source |
| 24 | pixtral-12b | Editorial | 38.4 | 2024 | Source |
| 25 | phi-4-multimodal | Editorial | 38.1 | 2025 | Source |
| 26 | glm-4v-9b | Editorial | 37.1 | 2024 | Source |
| 27 | molmo-7b | Editorial | 33.9 | 2024 | Source |
| 28 | llava-ov-7b | Editorial | 33.7 | 2024 | Source |
| 29 | idefics3-8b | Editorial | 26 | 2024 | Source |
| 30 | mistral-ocr-2512 Verified via CodeSOTA benchmark. 7,400 English samples. Mistral OCR is a pure OCR model (text extraction only) - not designed for VQA, chart parsing, or structured extraction tasks. Strong on full-page OCR (79.1%) and document parsing (55.2%). | Editorial | 25.2 | 2024 | Source |
| 31 | docowl2 | Editorial | 23.4 | 2024 | Source |
overall-zh-public
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | InternVL3-14B Table 3, arxiv:2501.00321. Highest on Chinese public split (tied with Qwen2.5-VL-7B). | Community | 55.7 | 2025 | Source |
| 2 | Qwen2.5-VL-7B Table 3, arxiv:2501.00321. | Community | 55.6 | 2025 | Source |
| 3 | Ovis2-8B Table 3, arxiv:2501.00321. | Community | 49.2 | 2025 | Source |
| 4 | Gemini 1.5 Pro Table 3, arxiv:2501.00321. | Community | 43.1 | 2024 | Source |
| 5 | DeepSeek-VL2-Small Table 3, arxiv:2501.00321. | Community | 42.7 | 2024 | Source |
| 6 | Step-1V Table 3, arxiv:2501.00321. | Community | 42.6 | 2024 | Source |
| 7 | MiniCPM-o-2.6 Table 3, arxiv:2501.00321. | Community | 41.1 | 2024 | Source |
| 8 | Claude 3.5 Sonnet Table 3, arxiv:2501.00321. | Community | 39.6 | 2024 | Source |
| 9 | GLM-4V-9B Table 3, arxiv:2501.00321. | Community | 36.6 | 2024 | Source |
| 10 | GPT-4o Table 3, arxiv:2501.00321. | Community | 32.2 | 2024 | Source |
| 11 | LLaVA-OneVision-7B Table 3, arxiv:2501.00321. | Community | 17.8 | 2024 | Source |
| 12 | TextMonkey Table 3, arxiv:2501.00321. | Community | 15.8 | 2024 | Source |
| 13 | Pixtral-12B Table 3, arxiv:2501.00321. | Community | 14.6 | 2024 | Source |
| 14 | Monkey Table 3, arxiv:2501.00321. | Community | 13.1 | 2024 | Source |
| 15 | Molmo-7B Table 3, arxiv:2501.00321. | Community | 12.8 | 2024 | Source |
| 16 | Cambrian-1-8B Table 3, arxiv:2501.00321. | Community | 9.9 | 2024 | Source |
| 17 | LLaVA-NeXT-8B Table 3, arxiv:2501.00321. | Community | 9.1 | 2024 | Source |
overall-en-public
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | InternVL3-14B Highest score on English public split. Table 2, arxiv:2501.00321. | Community | 52.6 | 2025 | Source |
| 2 | Gemini 1.5 Pro Table 2, arxiv:2501.00321. Gemini-1.5-Pro. | Community | 51.9 | 2024 | Source |
| 3 | Ovis2-8B Table 2, arxiv:2501.00321. | Community | 47.7 | 2025 | Source |
| 4 | Qwen2.5-VL-7B Table 2, arxiv:2501.00321. Same as Step-1V average (46.7). | Community | 46.7 | 2025 | Source |
| 5 | Step-1V Table 2, arxiv:2501.00321. | Community | 46.7 | 2024 | Source |
| 6 | GPT-4o Table 2, arxiv:2501.00321. | Community | 46.5 | 2024 | Source |
| 7 | Claude 3.5 Sonnet Table 2, arxiv:2501.00321. claude-3-5-sonnet-20241022. | Community | 45.2 | 2024 | Source |
| 8 | MiniCPM-o-2.6 Table 2, arxiv:2501.00321. | Community | 45.1 | 2024 | Source |
| 9 | DeepSeek-VL2-Small Table 2, arxiv:2501.00321. | Community | 43.3 | 2024 | Source |
| 10 | GLM-4V-9B Table 2, arxiv:2501.00321. | Community | 42.6 | 2024 | Source |
| 11 | Pixtral-12B Table 2, arxiv:2501.00321. | Community | 40.3 | 2024 | Source |
| 12 | LLaVA-OneVision-7B Table 2, arxiv:2501.00321. | Community | 36.4 | 2024 | Source |
| 13 | Cambrian-1-8B Table 2, arxiv:2501.00321. | Community | 34.7 | 2024 | Source |
| 14 | Molmo-7B Table 2, arxiv:2501.00321. | Community | 34.5 | 2024 | Source |
| 15 | LLaVA-NeXT-8B Table 2, arxiv:2501.00321. | Community | 31.5 | 2024 | Source |
| 16 | TextMonkey Table 2, arxiv:2501.00321. | Community | 23.9 | 2024 | Source |
| 17 | Monkey Table 2, arxiv:2501.00321. | Community | 23.1 | 2024 | Source |