Who leads the OCRBench v2 benchmark?

Qwen2.5-VL-72B currently leads OCRBench v2 with a score of 63.7 on Overall (Chinese).

What is the state-of-the-art score on OCRBench v2?

The state-of-the-art result on OCRBench v2 is 63.7 (Overall (Chinese)), achieved by Qwen2.5-VL-72B as of 2026.

How many models are tracked on OCRBench v2?

Codesota tracks 49 models on OCRBench v2 across 6 metrics.

When was the OCRBench v2 leaderboard last updated?

The OCRBench v2 leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2024.

Codesota · Benchmark · OCRBench v2Home/Leaderboards/OCRBench v2

South China University of Technology

OCRBench v2.

Name: OCRBench v2 Benchmark Results
Creator: South China University of Technology
Published: 2024-01-01
License: https://creativecommons.org/licenses/by/4.0/

Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Overall (Chinese)

Overall Zh Private is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Overall (Chinese)verifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Links	Edit
01	Qwen2.5-VL-72B From Qwen2.5-VL-72B-Instruct model card benchmark table.	paper	63.7	2025	Source ↗	Edit result
02	gemini-25-pro Chinese, Private split. #1 on Chinese	paper	62.2	2025	Source ↗	Edit result
03	Gemini 2.5 Pro Chinese, Private split. #1 on Chinese	unverified	62.2	2025	Source ↗	Edit result
04	Qianfan-OCR Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought. #1 on zh	paper	60.77	2025	Source ↗	Edit result
05	minicpm-v-4.5-8b Chinese, Private split. #4 overall	unverified	58.8	2025	Source ↗	Edit result
06	sail-vl2-8b	paper	57.6	2025	Source ↗	Edit result
07	claude-3.5-sonnet	unverified	48.4	2024	Source ↗	Edit result
08	InternVL2.5-78B From Qwen2.5-VL-72B-Instruct model card comparison table.	paper	46.2	2025	Source ↗	Edit result
09	Qwen2-VL-72B From Qwen2.5-VL-72B-Instruct model card comparison table.	paper	46.1	2024	Source ↗	Edit result
10	gpt-4o-2024	unverified	45.7	2024	Source ↗	Edit result

English Score

English Score is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for English Scoreverifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Links	Edit
01	Ovis2.5-9B	unverified	63.4	2025	Paper ↗Code ↗	Edit result
02	Intern-S1-Pro	unverified	60.1	2026	Paper ↗Source ↗	Edit result

Overall (English)

Overall En Private is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Overall (English)verifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Links	Edit
01	seed-1.6-vision English, Private split. #1 on OCRBench v2	paper	62.2	2025	Source ↗	Edit result
02	Seed1.6-vision English, Private split. #1 on OCRBench v2	unverified	62.2	2025	Source ↗	Edit result
03	Qwen2.5-VL-72B From Qwen2.5-VL-72B-Instruct model card benchmark table. HF: Qwen/Qwen2.5-VL-72B-Instruct.	paper	61.5	2025	Source ↗	Edit result
04	qwen3-omni-30b	paper	61.3	2025	Source ↗	Edit result
05	Nemotron Nano V2 VL	unverified	61.2	2025	Source ↗	Edit result
06	nemotron-nano-v2-vl	paper	61.2	2025	Source ↗	Edit result
07	gemini-25-pro	paper	59.3	2025	Source ↗	Edit result
08	Gemini 2.5 Pro	unverified	59.3	2025	Source ↗	Edit result
09	llama-3.1-nemotron-nano-vl-8b	paper	56.4	2025	Source ↗	Edit result
10	Qianfan-OCR Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought.	paper	56	2025	Source ↗	Edit result
11	gpt-4o Listed as GPT5-2025-08-07 on leaderboard	paper	55.5	2024	Source ↗	Edit result
12	ovis2.5-8b	unverified	54.1	2025	Source ↗	Edit result
13	gemini-1.5-pro	unverified	51.6	2024	Source ↗	Edit result
14	sail-vl2-8b	paper	49.3	2025	Source ↗	Edit result
15	minicpm-v-4.5-8b	unverified	48.4	2025	Source ↗	Edit result
16	Qwen2-VL-72B From Qwen2.5-VL-72B-Instruct model card comparison table.	paper	47.8	2024	Source ↗	Edit result
17	gpt-4o-2024 GPT-4o baseline (not GPT5-2025-08-07)	paper	47.6	2024	Source ↗	Edit result
18	claude-3.5-sonnet	paper	47.5	2024	Source ↗	Edit result
19	internvl3.5-14b	unverified	47.1	2025	Source ↗	Edit result
20	step-1v	unverified	46.8	2024	Source ↗	Edit result
21	grok4	unverified	45	2025	Source ↗	Edit result
22	InternVL2.5-78B From Qwen2.5-VL-72B-Instruct model card comparison table.	paper	45	2025	Source ↗	Edit result
23	GPT-4o mini	unverified	44.1	2024	Source ↗	Edit result
24	gpt-4o-mini	paper	44.1	2024	Source ↗	Edit result
25	Claude Sonnet 4 Claude-sonnet-4-20250514	unverified	42.4	2025	Source ↗	Edit result
26	claude-sonnet-4 Claude-sonnet-4-20250514	paper	42.4	2025	Source ↗	Edit result
27	qwen2.5-vl-7b	unverified	41.8	2025	Source ↗	Edit result
28	deepseek-vl2-small	paper	41	2024	Source ↗	Edit result
29	pixtral-12b	unverified	38.4	2024	Source ↗	Edit result
30	phi-4-multimodal	unverified	38.1	2025	Source ↗	Edit result
31	glm-4v-9b	unverified	37.1	2024	Source ↗	Edit result
32	molmo-7b	unverified	33.9	2024	Source ↗	Edit result
33	llava-ov-7b	paper	33.7	2024	Source ↗	Edit result

Chinese Score

Chinese Score is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Chinese Scoreverifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Links	Edit
01	Intern-S1-Pro	unverified	60.6	2026	Paper ↗Source ↗	Edit result
02	Ovis2.5-9B	unverified	58	2025	Paper ↗Code ↗	Edit result

Overall Zh Public

Overall Zh Public is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Overall Zh Publicverifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Links	Edit
01	InternVL3-14B Table 3, arxiv:2501.00321. Highest on Chinese public split (tied with Qwen2.5-VL-7B).	paper	55.7	2025	Source ↗	Edit result
02	Qwen2.5-VL-7B Table 3, arxiv:2501.00321.	paper	55.6	2025	Source ↗	Edit result
03	Ovis2-8B Table 3, arxiv:2501.00321.	paper	49.2	2025	Source ↗	Edit result
04	Gemini 1.5 Pro Table 3, arxiv:2501.00321.	paper	43.1	2024	Source ↗	Edit result
05	DeepSeek-VL2-Small Table 3, arxiv:2501.00321.	paper	42.7	2024	Source ↗	Edit result
06	Step-1V Table 3, arxiv:2501.00321.	paper	42.6	2024	Source ↗	Edit result
07	MiniCPM-o-2.6 Table 3, arxiv:2501.00321.	paper	41.1	2024	Source ↗	Edit result
08	Claude 3.5 Sonnet Table 3, arxiv:2501.00321.	paper	39.6	2024	Source ↗	Edit result
09	GLM-4V-9B Table 3, arxiv:2501.00321.	paper	36.6	2024	Source ↗	Edit result
10	GPT-4o Table 3, arxiv:2501.00321.	paper	32.2	2024	Source ↗	Edit result

Overall En Public

Overall En Public is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Overall En Publicverifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Links	Edit
01	InternVL3-14B Highest score on English public split. Table 2, arxiv:2501.00321.	paper	52.6	2025	Source ↗	Edit result
02	Gemini 1.5 Pro Table 2, arxiv:2501.00321. Gemini-1.5-Pro.	paper	51.9	2024	Source ↗	Edit result
03	Ovis2-8B Table 2, arxiv:2501.00321.	paper	47.7	2025	Source ↗	Edit result
04	Step-1V Table 2, arxiv:2501.00321.	paper	46.7	2024	Source ↗	Edit result
05	Qwen2.5-VL-7B Table 2, arxiv:2501.00321. Same as Step-1V average (46.7).	paper	46.7	2025	Source ↗	Edit result
06	GPT-4o Table 2, arxiv:2501.00321.	paper	46.5	2024	Source ↗	Edit result
07	Claude 3.5 Sonnet Table 2, arxiv:2501.00321. claude-3-5-sonnet-20241022.	paper	45.2	2024	Source ↗	Edit result
08	MiniCPM-o-2.6 Table 2, arxiv:2501.00321.	paper	45.1	2024	Source ↗	Edit result
09	DeepSeek-VL2-Small Table 2, arxiv:2501.00321.	paper	43.3	2024	Source ↗	Edit result
10	GLM-4V-9B Table 2, arxiv:2501.00321.	paper	42.6	2024	Source ↗	Edit result
11	Pixtral-12B Table 2, arxiv:2501.00321.	paper	40.3	2024	Source ↗	Edit result
12	LLaVA-OneVision-7B Table 2, arxiv:2501.00321.	paper	36.4	2024	Source ↗	Edit result
13	Cambrian-1-8B Table 2, arxiv:2501.00321.	paper	34.7	2024	Source ↗	Edit result
14	Molmo-7B Table 2, arxiv:2501.00321.	paper	34.5	2024	Source ↗	Edit result

Lineage

OCRBench v2 in context.

See full ocr benchmarks lineage →

Predecessors (1)

superseded2023-05

OCRBench

10× more items, human-verified, EN+ZH parity, four public/private splits to combat contamination. Original v1 saturated within 18 months; v2 reopened the gap.

This benchmark (1)

active2024-12

OCRBench v2

Successors (3)

active2025-02

KITAB-Bench

Arabic-script-specific fork — same VLM-era 'comprehensive OCR' framing, but on a script English-centric benchmarks under-cover. Frontier models still post CER >0.13.

active2025-06

ThaiOCRBench

Thai-script equivalent — TED scoring over parse trees. Another working benchmark for a writing system that gets near-zero attention in English-centric papers.

active2024-12

OmniDocBench