Codesota · OCR · Benchmarks · olmOCR-BenchHome/OCR/Benchmarks/olmOCR-Bench
Allen Institute for AI

olmOCR-Bench.

PDF content extraction benchmark with 7,010 unit tests across 1,402 PDF documents.

View on AlphaXiv
§ 01 · base

base.

Higher is better

#ModelScoreSource
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
99.9codesota-api
2
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
99.7codesota-api
3
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
99.6codesota-api
4
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
99.6codesota-api
§ 02 · headers-footers

headers-footers.

Higher is better

#ModelScoreSource
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
96.1codesota-api
2
olmocr-v0.3.0
Fetched from CodeSOTA API on 2026-04-20
95.1codesota-api
3
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
90.8codesota-api
4
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
42codesota-api
§ 03 · long-tiny-text

long-tiny-text.

Higher is better

#ModelScoreSource
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
92.3codesota-api
2
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
91.4codesota-api
3
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
81.9codesota-api
4
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
80.4codesota-api
§ 04 · multi-column

multi-column.

Higher is better

#ModelScoreSource
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
92.2codesota-api
2
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
84.8codesota-api
3
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
83.7codesota-api
4
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
81.2codesota-api
§ 05 · arxiv

arxiv.

Higher is better

#ModelScoreSource
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
89.6codesota-api
2
marker-1.10.0
Fetched from CodeSOTA API on 2026-04-20
83.8codesota-api
3
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
83codesota-api
4
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
82.2codesota-api
5
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
80.1codesota-api
§ 06 · tables

tables.

Higher is better

#ModelScoreSource
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
89codesota-api
2
dots-ocr-3b
Fetched from CodeSOTA API on 2026-04-20
88.3codesota-api
3
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
88codesota-api
4
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
84.9codesota-api
5
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
81.6codesota-api
§ 07 · old-scans-math

old-scans-math.

Higher is better

#ModelScoreSource
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
85.6codesota-api
2
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
82.3codesota-api
3
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
80.3codesota-api
4
olmocr-v0.3.0
Fetched from CodeSOTA API on 2026-04-20
79.9codesota-api
§ 08 · Pass Rate

Pass Rate.

Percentage of unit tests passed

Higher is better

#ModelScoreSource
dots.mocr
Fetched from CodeSOTA API on 2026-04-20
83.9%codesota-api
2
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
83.2%codesota-api
3
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
83.1%codesota-api
4
infinity-parser-7b
Fetched from CodeSOTA API on 2026-04-20
82.5%codesota-api
5
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
82.4%codesota-api
6
paddleocr-vl
Fetched from CodeSOTA API on 2026-04-20
80%codesota-api
7
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
79.8%codesota-api
8
Qwen3-VL-4B
Fetched from CodeSOTA API on 2026-04-20
79.2%codesota-api
9
PaddleOCR-VL-1.5
Fetched from CodeSOTA API on 2026-04-20
79.1%codesota-api
10
dots-ocr-3b
Fetched from CodeSOTA API on 2026-04-20
79.1%codesota-api
11
mistral-ocr-3
Fetched from CodeSOTA API on 2026-04-20
78%codesota-api
12
marker-1.10.0
Fetched from CodeSOTA API on 2026-04-20
76.5%codesota-api
13
marker-1.10.1
Fetched from CodeSOTA API on 2026-04-20
76.1%codesota-api
14
MonkeyOCR-pro-3B
Fetched from CodeSOTA API on 2026-04-20
75.8%codesota-api
15
deepseek-ocr
Fetched from CodeSOTA API on 2026-04-20
75.7%codesota-api
16
mineru-2.5
Fetched from CodeSOTA API on 2026-04-20
75.2%codesota-api
17
mistral-ocr-api
Fetched from CodeSOTA API on 2026-04-20
72%codesota-api
18
gpt-4o-anchored
Fetched from CodeSOTA API on 2026-04-20
69.9%codesota-api
19
nanonets-ocr2-3b
Fetched from CodeSOTA API on 2026-04-20
69.5%codesota-api
20
gemini-flash-2
Fetched from CodeSOTA API on 2026-04-20
63.8%codesota-api
§ 09 · old-scans

old-scans.

Higher is better

#ModelScoreSource
Qianfan-OCR
Fetched from CodeSOTA API on 2026-04-20
73.1codesota-api
2
chandra-ocr-0.1.0
Fetched from CodeSOTA API on 2026-04-20
50.4codesota-api
3
olmocr-v0.4.0
Fetched from CodeSOTA API on 2026-04-20
47.7codesota-api
4
LightOnOCR-2-1B
Fetched from CodeSOTA API on 2026-04-20
42.2codesota-api
5
gpt-4o
Fetched from CodeSOTA API on 2026-04-20
40.7codesota-api
§ Related · Explore

More OCR content.

Verified Model Reviews
Comparisons & Guides
View all OCR benchmarks → Back to All Benchmarks