OCRBench v2.

Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

32 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.

Primary: overall-en-private · higher is better
All metrics: overall-en-private, overall-zh-private

overall-en-private· primary

27 rows

#	Model	Org	Submitted	Paper / code	overall-en-private
01	Seed1.6-visionAPI	ByteDance	Jun 2025	alphaxiv-leaderboard	62.20
02	Qwen3-Omni-30BOSS	Alibaba	Apr 2025	alphaxiv-leaderboard	61.30
03	Nemotron Nano V2 VLOSS	NVIDIA	Mar 2025	alphaxiv-leaderboard	61.20
04	Gemini 2.5 ProAPI	Google	Mar 2025	alphaxiv-leaderboard	59.30
05	llama-3.1-nemotron-nano-vl-8b	—	Mar 2025	ocrbench-v2-leaderboard	56.40
06	GPT-4oAPI	OpenAI	May 2024	alphaxiv-leaderboard	55.50
07	ovis2.5-8b	—	Feb 2025	ocrbench-v2-leaderboard	54.10
08	gemini-1.5-pro	—	May 2024	ocrbench-v2-leaderboard	51.60
09	sail-vl2-8b	—	Mar 2025	ocrbench-v2-leaderboard	49.30
10	minicpm-v-4.5-8b	—	May 2025	ocrbench-v2-leaderboard	48.40
11	gpt-4o-2024	—	May 2024	ocrbench-v2-leaderboard	47.60
12	claude-3.5-sonnet	—	Jun 2024	ocrbench-v2-leaderboard	47.50
13	internvl3.5-14b	—	Jun 2025	ocrbench-v2-leaderboard	47.10
14	step-1v	—	Dec 2024	ocrbench-v2-leaderboard	46.80
15	grok4	—	Jul 2025	ocrbench-v2-leaderboard	45
16	GPT-4o mini	OpenAI	Jul 2024	ocrbench-v2-leaderboard	44.10
17	Claude Sonnet 4API	Anthropic	May 2025	ocrbench-v2-leaderboard	42.40
18	qwen2.5-vl-7b	—	Jan 2025	ocrbench-v2-leaderboard	41.80
19	deepseek-vl2-small	—	Dec 2024	ocrbench-v2-leaderboard	41
20	pixtral-12b	—	Sep 2024	ocrbench-v2-leaderboard	38.40
21	phi-4-multimodal	—	Feb 2025	ocrbench-v2-leaderboard	38.10
22	glm-4v-9b	—	Jun 2024	ocrbench-v2-leaderboard	37.10
23	molmo-7b	—	Sep 2024	ocrbench-v2-leaderboard	33.90
24	llava-ov-7b	—	Oct 2024	ocrbench-v2-leaderboard	33.70
25	idefics3-8b	—	Aug 2024	ocrbench-v2-leaderboard	26
26	mistral-ocr-2512	—	Dec 2024	codesota-verified	25.20
27	docowl2	—	May 2024	ocrbench-v2-leaderboard	23.40

overall-zh-private

5 rows

#	Model	Org	Submitted	Paper / code	overall-zh-private
01	Gemini 2.5 ProAPI	Google	Mar 2025	alphaxiv-leaderboard	62.20
02	minicpm-v-4.5-8b	—	May 2025	ocrbench-v2-leaderboard	58.80
03	sail-vl2-8b	—	Mar 2025	ocrbench-v2-leaderboard	57.60
04	claude-3.5-sonnet	—	Jun 2024	ocrbench-v2-leaderboard	48.40
05	gpt-4o-2024	—	May 2024	ocrbench-v2-leaderboard	45.70

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

4 steps
of state of the art.

Each row below marks a model that broke the previous record on overall-en-private. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · overall-en-private

May 13, 2024GPT-4oOpenAI55.50
Mar 18, 2025Nemotron Nano V2 VLNVIDIA61.20
Apr 29, 2025Qwen3-Omni-30BAlibaba61.30
Jun 15, 2025Seed1.6-visionByteDance62.20

Fig 3 · SOTA-setting models only. 4 entries span May 2024 → Jun 2025.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

OCRBench v2.

Best published scores.

4 stepsof state of the art.

Neighbouring benchmarks.

Have a score that beatsthis table?

4 steps
of state of the art.

Have a score that beats
this table?