Who leads the olmOCR-Bench benchmark?

dots.mocr currently leads olmOCR-Bench with a score of 83.90 on pass-rate.

What is the state-of-the-art score on olmOCR-Bench?

The state-of-the-art result on olmOCR-Bench is 83.90 (pass-rate), achieved by dots.mocr as of 2026.

How many models are tracked on olmOCR-Bench?

Codesota tracks 34 models on olmOCR-Bench across 10 metrics.

When was the olmOCR-Bench leaderboard last updated?

The olmOCR-Bench leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2025.

Codesota · Computer Vision · Document Parsing · olmOCR-BenchTasks/Computer Vision/Document Parsing

Document Parsing · benchmark dataset · 2024 · EN

olmOCR-Bench.

Name: olmOCR-Bench Benchmark Results
Creator: Codesota
Published: 2025-01-01
License: https://creativecommons.org/licenses/by/4.0/

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

74 results indexed across 10 metrics. Shaded row marks current SOTA; ties broken by submission date.

Primary: pass-rate · higher is better
All metrics: accuracy, arxiv, base, headers-footers, long-tiny-text, multi-column, old-scans, old-scans-math, pass-rate, tables

accuracy

18 rows

#	Model	Org	Submitted	Paper / code	accuracy
01	Infinity-Parser2-Pro	—	May 2026	pwc-dump	87.60
02	Chandra 2	—	Mar 2026	pwc-dump · code	85.90
03	dots.mocr	—	Mar 2026	Multimodal OCR: Parse Anything from Documents · code	83.90
04	LightOnOCR-2-1BOpen	LightOn	Jan 2026	LightOnOCR: A 1B End-to-End Multilingual Vision-Language…	83.20
05	Chandra	—	Oct 2025	pwc-dump	83.10
06	Infinity-Parser 7BOpen	—	Jun 2025	Infinity Parser: Layout Aware Reinforcement Learning for… · code	82.50
07	olmOCR-2-7B-1025 (7B)	—	Oct 2025	olmOCR 2: Unit Test Rewards for Document OCR	82.40
08	Falcon-OCR	—	Mar 2026	Falcon Perception · code	80.30
09	PaddleOCR-VLOpen	Baidu	Oct 2025	PaddleOCR-VL: Boosting Multilingual Document Parsing via… · code	80
10	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	Qianfan-OCR: A Unified End-to-End Model for Document Int… · code	79.80
11	dots.ocr	—	Dec 2025	dots.ocr: Multilingual Document Layout Parsing in a Sing… · code	79.10
12	MinerU2.5	—	Sep 2025	MinerU2.5: A Decoupled Vision-Language Model for Efficie… · code	77.50
13	DeepSeek-OCR-2	—	Jan 2026	DeepSeek-OCR 2: Visual Causal Flow · code	76.30
14	LightOnOCR-1B-1025	—	Jan 2026	LightOnOCR: A 1B End-to-End Multilingual Vision-Language…	76.10
15	DeepSeek-OCROpen	DeepSeek	Oct 2025	DeepSeek-OCR: Contexts Optical Compression · code	75.70
16	olmOCR-7BOpen	Allen AI	Feb 2025	olmOCR: Unlocking Trillions of Tokens in PDFs with Visio… · code	75.50
17	GLM-OCROpen	Zhipu AI	Mar 2026	GLM-OCR Technical Report	75.20
18	FireRed-OCR	—	Mar 2026	FireRed-OCR Technical Report · code	70.20

arxiv

5 rows

#	Model	Org	Submitted	Paper / code	arxiv
01	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	89.60
02	Marker 1.10.0Open	VikParuchuri	Dec 2025	github-readme	83.80
03	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	83
04	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	82.20
05	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	80.10

base

4 rows

#	Model	Org	Submitted	Paper / code	base
01	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	99.90
02	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	99.70
03	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	99.60
04	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	99.60

headers-footers

4 rows

#	Model	Org	Submitted	Paper / code	headers-footers
01	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	96.10
02	olmOCR v0.3.0Open	Allen AI	Dec 2025	github-readme	95.10
03	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	90.80
04	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	42

long-tiny-text

4 rows

#	Model	Org	Submitted	Paper / code	long-tiny-text
01	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	92.30
02	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	91.40
03	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	81.90
04	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	80.40

multi-column

4 rows

#	Model	Org	Submitted	Paper / code	multi-column
01	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	92.20
02	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	84.80
03	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	83.70
04	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	81.20

old-scans

5 rows

#	Model	Org	Submitted	Paper / code	old-scans
01	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	73.10
02	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	50.40
03	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	47.70
04	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	42.20
05	GPT-4oAPI	OpenAI	Dec 2025	github-readme	40.70

old-scans-math

4 rows

#	Model	Org	Submitted	Paper / code	old-scans-math
01	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	85.60
02	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	82.30
03	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	80.30
04	olmOCR v0.3.0Open	Allen AI	Dec 2025	github-readme	79.90

pass-rate· primary

21 rows

#	Model	Org	Submitted	Paper / code	pass-rate
01	dots.mocrOpen	RedNote	Mar 2026	github-readme	83.90
02	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	83.20
03	Chandra v0.1.0Open	datalab-to	Dec 2025	alphaxiv-leaderboard	83.10
04	Infinity-Parser 7BOpen	—	Dec 2025	alphaxiv-leaderboard	82.50
05	olmOCR v0.4.0Open	Allen AI	Dec 2025	alphaxiv-leaderboard	82.40
06	PaddleOCR-VLOpen	Baidu	Dec 2025	alphaxiv-leaderboard	80
07	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	79.80
08	Qwen3-VL-4BOpen	Alibaba Qwen	Mar 2026	paper	79.20
09	dots.ocr 3BOpen	RedNote HILab	Dec 2025	github-readme	79.10
10	PaddleOCR-VL-1.5Open	Baidu PaddlePaddle	Mar 2026	paper	79.10
11	Mistral OCR 3API	Mistral	Dec 2025	mistral-announcement	78
12	Marker 1.10.0Open	VikParuchuri	Dec 2025	github-readme	76.50
13	Marker 1.10.1Open	VikParuchuri	Dec 2025	alphaxiv-leaderboard	76.10
14	MonkeyOCR-pro-3BOpen	—	Jun 2025	paper	75.80
15	DeepSeek-OCROpen	DeepSeek	Dec 2025	alphaxiv-leaderboard	75.70
16	DeepSeek-OCROpen	DeepSeek	Dec 2025	github-readme	75.40
17	MinerU 2.5Open	OpenDataLab	Dec 2025	alphaxiv-leaderboard	75.20
18	Mistral OCR 2API	Mistral	Dec 2025	alphaxiv-leaderboard	72
19	GPT-4o (Anchored)	OpenAI	Dec 2025	github-readme	69.90
20	Nanonets OCR2 3B	Nanonets	Dec 2025	alphaxiv-leaderboard	69.50
21	Gemini Flash 2	Google	Dec 2025	github-readme	63.80

tables

5 rows

#	Model	Org	Submitted	Paper / code	tables
01	LightOnOCR-2-1BOpen	LightOn	Jan 2026	paper	89
02	dots.ocr 3BOpen	RedNote HILab	Dec 2025	github-readme	88.30
03	Chandra v0.1.0Open	datalab-to	Dec 2025	github-readme	88
04	olmOCR v0.4.0Open	Allen AI	Oct 2025	paper	84.90
05	Qianfan-OCROpen	Baidu Qianfan	Mar 2026	paper	81.60

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

4 steps
of state of the art.

Each row below marks a model that broke the previous record on pass-rate. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · pass-rate

Jun 5, 2025MonkeyOCR-pro-3B75.80
Dec 16, 2025Chandra v0.1.0datalab-to83.10
Jan 20, 2026LightOnOCR-2-1BLightOn83.20
Mar 19, 2026dots.mocrRedNote83.90

Fig 3 · SOTA-setting models only. 4 entries span Jun 2025 → Mar 2026.

§ 04 · Literature

14 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

Falcon Perception
Mar 2026·Falcon-OCR
arXiv ↗Code
Multimodal OCR: Parse Anything from Documents
Mar 2026·dots.mocr
arXiv ↗Code
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Mar 2026·Qianfan-OCR
arXiv ↗Code
GLM-OCR Technical Report
Mar 2026·GLM-OCR
arXiv ↗
FireRed-OCR Technical Report
Mar 2026·FireRed-OCR
arXiv ↗Code
DeepSeek-OCR 2: Visual Causal Flow
Jan 2026·DeepSeek-OCR-2
arXiv ↗Code
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
Jan 2026·LightOnOCR-2-1B, LightOnOCR-1B-1025
arXiv ↗
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
Dec 2025·dots.ocr
arXiv ↗Code
olmOCR 2: Unit Test Rewards for Document OCR
Oct 2025·olmOCR-2-7B-1025 (7B)
arXiv ↗
DeepSeek-OCR: Contexts Optical Compression
Oct 2025·DeepSeek-OCR
arXiv ↗Code
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Oct 2025·PaddleOCR-VL
arXiv ↗Code
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Sep 2025·MinerU2.5
arXiv ↗Code
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
Jun 2025·Infinity-Parser 7B
arXiv ↗Code
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Feb 2025·olmOCR-7B
arXiv ↗Code

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

olmOCR-Bench.

Best published scores.

4 stepsof state of the art.

14 paperstied to this benchmark.

Neighbouring benchmarks.

Have a score that beatsthis table?

4 steps
of state of the art.

14 papers
tied to this benchmark.

Have a score that beats
this table?