OCRBench v2

South China University of Technology

Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.

Benchmark Stats

Models48
Papers74
Metrics4

SOTA History

Overall (Chinese)

Average score on Chinese private test set

Higher is better

RankModelSourceScoreYearPaper
1Qwen2.5-VL-72B

From Qwen2.5-VL-72B-Instruct model card benchmark table.

Community63.72025Source
2gemini-25-pro

Chinese, Private split. #1 on Chinese

Editorial62.22025Source
3Qianfan-OCR

Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought. #1 on zh

Editorial60.772025Source
4minicpm-v-4.5-8b

Chinese, Private split. #4 overall

Editorial58.82025Source
5sail-vl2-8bEditorial57.62025Source
6claude-3.5-sonnetEditorial48.42024Source
7InternVL2.5-78B

From Qwen2.5-VL-72B-Instruct model card comparison table.

Community46.22025Source
8Qwen2-VL-72B

From Qwen2.5-VL-72B-Instruct model card comparison table.

Community46.12024Source
9gpt-4o-2024Editorial45.72024Source

Overall (English)

Average score on English private test set

Higher is better

RankModelSourceScoreYearPaper
1seed-1.6-vision

English, Private split. #1 on OCRBench v2

Editorial62.22025Source
2Qwen2.5-VL-72B

From Qwen2.5-VL-72B-Instruct model card benchmark table. HF: Qwen/Qwen2.5-VL-72B-Instruct.

Community61.52025Source
3qwen3-omni-30bEditorial61.32025Source
4nemotron-nano-v2-vlEditorial61.22025Source
5gemini-25-proEditorial59.32025Source
6llama-3.1-nemotron-nano-vl-8bEditorial56.42025Source
7Qianfan-OCR

Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought.

Editorial562025Source
8gpt-4o

Listed as GPT5-2025-08-07 on leaderboard

Editorial55.52024Source
9ovis2.5-8bEditorial54.12025Source
10gemini-1.5-proEditorial51.62024Source
11sail-vl2-8bEditorial49.32025Source
12minicpm-v-4.5-8bEditorial48.42025Source
13Qwen2-VL-72B

From Qwen2.5-VL-72B-Instruct model card comparison table.

Community47.82024Source
14gpt-4o-2024

GPT-4o baseline (not GPT5-2025-08-07)

Editorial47.62024Source
15claude-3.5-sonnetEditorial47.52024Source
16internvl3.5-14bEditorial47.12025Source
17step-1vEditorial46.82024Source
18InternVL2.5-78B

From Qwen2.5-VL-72B-Instruct model card comparison table.

Community452025Source
19grok4Editorial452025Source
20gpt-4o-miniEditorial44.12024Source
21claude-sonnet-4

Claude-sonnet-4-20250514

Editorial42.42025Source
22qwen2.5-vl-7bEditorial41.82025Source
23deepseek-vl2-smallEditorial412024Source
24pixtral-12bEditorial38.42024Source
25phi-4-multimodalEditorial38.12025Source
26glm-4v-9bEditorial37.12024Source
27molmo-7bEditorial33.92024Source
28llava-ov-7bEditorial33.72024Source
29idefics3-8bEditorial262024Source
30mistral-ocr-2512

Verified via CodeSOTA benchmark. 7,400 English samples. Mistral OCR is a pure OCR model (text extraction only) - not designed for VQA, chart parsing, or structured extraction tasks. Strong on full-page OCR (79.1%) and document parsing (55.2%).

Editorial25.22024Source
31docowl2Editorial23.42024Source

overall-zh-public

Higher is better

RankModelSourceScoreYearPaper
1InternVL3-14B

Table 3, arxiv:2501.00321. Highest on Chinese public split (tied with Qwen2.5-VL-7B).

Community55.72025Source
2Qwen2.5-VL-7B

Table 3, arxiv:2501.00321.

Community55.62025Source
3Ovis2-8B

Table 3, arxiv:2501.00321.

Community49.22025Source
4Gemini 1.5 Pro

Table 3, arxiv:2501.00321.

Community43.12024Source
5DeepSeek-VL2-Small

Table 3, arxiv:2501.00321.

Community42.72024Source
6Step-1V

Table 3, arxiv:2501.00321.

Community42.62024Source
7MiniCPM-o-2.6

Table 3, arxiv:2501.00321.

Community41.12024Source
8Claude 3.5 Sonnet

Table 3, arxiv:2501.00321.

Community39.62024Source
9GLM-4V-9B

Table 3, arxiv:2501.00321.

Community36.62024Source
10GPT-4o

Table 3, arxiv:2501.00321.

Community32.22024Source
11LLaVA-OneVision-7B

Table 3, arxiv:2501.00321.

Community17.82024Source
12TextMonkey

Table 3, arxiv:2501.00321.

Community15.82024Source
13Pixtral-12B

Table 3, arxiv:2501.00321.

Community14.62024Source
14Monkey

Table 3, arxiv:2501.00321.

Community13.12024Source
15Molmo-7B

Table 3, arxiv:2501.00321.

Community12.82024Source
16Cambrian-1-8B

Table 3, arxiv:2501.00321.

Community9.92024Source
17LLaVA-NeXT-8B

Table 3, arxiv:2501.00321.

Community9.12024Source

overall-en-public

Higher is better

RankModelSourceScoreYearPaper
1InternVL3-14B

Highest score on English public split. Table 2, arxiv:2501.00321.

Community52.62025Source
2Gemini 1.5 Pro

Table 2, arxiv:2501.00321. Gemini-1.5-Pro.

Community51.92024Source
3Ovis2-8B

Table 2, arxiv:2501.00321.

Community47.72025Source
4Qwen2.5-VL-7B

Table 2, arxiv:2501.00321. Same as Step-1V average (46.7).

Community46.72025Source
5Step-1V

Table 2, arxiv:2501.00321.

Community46.72024Source
6GPT-4o

Table 2, arxiv:2501.00321.

Community46.52024Source
7Claude 3.5 Sonnet

Table 2, arxiv:2501.00321. claude-3-5-sonnet-20241022.

Community45.22024Source
8MiniCPM-o-2.6

Table 2, arxiv:2501.00321.

Community45.12024Source
9DeepSeek-VL2-Small

Table 2, arxiv:2501.00321.

Community43.32024Source
10GLM-4V-9B

Table 2, arxiv:2501.00321.

Community42.62024Source
11Pixtral-12B

Table 2, arxiv:2501.00321.

Community40.32024Source
12LLaVA-OneVision-7B

Table 2, arxiv:2501.00321.

Community36.42024Source
13Cambrian-1-8B

Table 2, arxiv:2501.00321.

Community34.72024Source
14Molmo-7B

Table 2, arxiv:2501.00321.

Community34.52024Source
15LLaVA-NeXT-8B

Table 2, arxiv:2501.00321.

Community31.52024Source
16TextMonkey

Table 2, arxiv:2501.00321.

Community23.92024Source
17Monkey

Table 2, arxiv:2501.00321.

Community23.12024Source

Submit a Result