Codesota · Benchmark · olmOCR-BenchHome/Leaderboards/Vision & Documents/Document Parsing/olmOCR-Bench

Allen Institute for AI

olmOCR-Bench.

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Base

Base is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Baseverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Chandra v0.1.0 Base clean document parsing. Near-perfect	unverified	99.9	2025	Source ↗	Looks wrong?
02	chandra-ocr-0.1.0 Base clean document parsing. Near-perfect	paper	99.9	2025	Source ↗	Looks wrong?
03	olmOCR v0.4.0 olmOCR 2. Sub-category: base clean documents.	paper	99.7	2025	Source ↗	Looks wrong?
04	olmocr-v0.4.0 olmOCR 2. Base clean documents sub-category.	paper	99.7	2025	Source ↗	Looks wrong?
05	LightOnOCR-2-1B LightOnOCR-2-1B. Base clean documents sub-category.	paper	99.6	2026	Source ↗	Looks wrong?
06	Qianfan-OCR Qianfan-OCR. Base clean documents sub-category.	paper	99.6	2026	Source ↗	Looks wrong?

Headers Footers

Headers Footers is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Headers Footersverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	olmOCR v0.4.0 olmOCR 2. Sub-category: headers/footers.	paper	96.1	2025	Source ↗	Looks wrong?
02	olmocr-v0.4.0 olmOCR 2. Headers/footers sub-category.	paper	96.1	2025	Source ↗	Looks wrong?
03	olmOCR v0.3.0 #1 on headers/footers extraction	unverified	95.1	2025	Source ↗	Looks wrong?
04	olmocr-v0.3.0 #1 on headers/footers extraction	paper	95.1	2025	Source ↗	Looks wrong?
05	chandra-ocr-0.1.0 Header/footer extraction	paper	90.8	2025	Source ↗	Looks wrong?
06	Chandra v0.1.0 Header/footer extraction	unverified	90.8	2025	Source ↗	Looks wrong?

Long Tiny Text

Long Tiny Text is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Long Tiny Textverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Chandra v0.1.0 Long documents with tiny text. #1 in category	unverified	92.3	2025	Source ↗	Looks wrong?
02	chandra-ocr-0.1.0 Long documents with tiny text. #1 in category	paper	92.3	2025	Source ↗	Looks wrong?
03	LightOnOCR-2-1B LightOnOCR-2-1B. Long tiny text sub-category.	paper	91.4	2026	Source ↗	Looks wrong?
04	olmocr-v0.4.0 olmOCR 2. Long tiny text sub-category.	paper	81.9	2025	Source ↗	Looks wrong?
05	olmOCR v0.4.0 olmOCR 2. Sub-category: long tiny text.	paper	81.9	2025	Source ↗	Looks wrong?
06	Qianfan-OCR Qianfan-OCR. Long tiny text sub-category.	paper	80.4	2026	Source ↗	Looks wrong?

Multi Column

Multi Column is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Multi Columnverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Qianfan-OCR Qianfan-OCR. Multi-column layout sub-category.	paper	92.2	2026	Source ↗	Looks wrong?
02	LightOnOCR-2-1B LightOnOCR-2-1B. Multi-column layout sub-category.	paper	84.8	2026	Source ↗	Looks wrong?
03	olmocr-v0.4.0 olmOCR 2. Multi-column layout sub-category.	paper	83.7	2025	Source ↗	Looks wrong?
04	olmOCR v0.4.0 olmOCR 2. Sub-category: multi-column layout.	paper	83.7	2025	Source ↗	Looks wrong?
05	Chandra v0.1.0 Multi-column document parsing	unverified	81.2	2025	Source ↗	Looks wrong?
06	chandra-ocr-0.1.0 Multi-column document parsing	paper	81.2	2025	Source ↗	Looks wrong?

Arxiv

Arxiv is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Arxivverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	LightOnOCR-2-1B LightOnOCR-2-1B. ArXiv math documents sub-category.	paper	89.6	2026	Source ↗	Looks wrong?
02	marker-1.10.0 #1 on ArXiv paper parsing	paper	83.8	2025	Source ↗	Looks wrong?
03	Marker 1.10.0 #1 on ArXiv paper parsing	unverified	83.8	2025	Source ↗	Looks wrong?
04	olmOCR v0.4.0 olmOCR 2 (arxiv:2510.19817). Sub-category: ArXiv math documents.	paper	83	2025	Source ↗	Looks wrong?
05	olmocr-v0.4.0 olmOCR 2 (arxiv:2510.19817). ArXiv math documents sub-category.	paper	83	2025	Source ↗	Looks wrong?
06	chandra-ocr-0.1.0 ArXiv paper parsing. Marker leads (83.8)	paper	82.2	2025	Source ↗	Looks wrong?
07	Chandra v0.1.0 ArXiv paper parsing. Marker leads (83.8)	unverified	82.2	2025	Source ↗	Looks wrong?
08	Qianfan-OCR Qianfan-OCR. ArXiv math documents sub-category.	paper	80.1	2026	Source ↗	Looks wrong?

Tables

Tables is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Tablesverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	LightOnOCR-2-1B LightOnOCR-2-1B. Table recognition sub-category.	paper	89	2026	Source ↗	Looks wrong?
02	dots.ocr 3B #1 on table recognition	unverified	88.3	2025	Source ↗	Looks wrong?
03	dots-ocr-3b #1 on table recognition	paper	88.3	2025	Source ↗	Looks wrong?
04	Chandra v0.1.0 Table recognition category. Near-best (dots.ocr: 88.3)	unverified	88	2025	Source ↗	Looks wrong?
05	chandra-ocr-0.1.0 Table recognition category. Near-best (dots.ocr: 88.3)	paper	88	2025	Source ↗	Looks wrong?
06	olmocr-v0.4.0 olmOCR 2. Table recognition sub-category.	paper	84.9	2025	Source ↗	Looks wrong?
07	olmOCR v0.4.0 olmOCR 2. Sub-category: table recognition.	paper	84.9	2025	Source ↗	Looks wrong?
08	Qianfan-OCR Qianfan-OCR. Table recognition sub-category.	paper	81.6	2026	Source ↗	Looks wrong?

Accuracy

Accuracy is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Infinity-Parser2-Pro	unverified	87.6	2026	Paper ↗	Looks wrong?
02	Chandra 2	unverified	85.9	2026	Paper ↗Code ↗	Looks wrong?
03	dots.mocr	unverified	83.9	2026	Paper ↗Code ↗	Looks wrong?
04	LightOnOCR-2-1B	unverified	83.2	2026	Paper ↗Source ↗	Looks wrong?
05	Chandra	unverified	83.1	2025	Paper ↗	Looks wrong?
06	Infinity-Parser 7B	unverified	82.5	2025	Paper ↗Code ↗	Looks wrong?
07	olmOCR-2-7B-1025 (7B)	unverified	82.4	2025	Paper ↗	Looks wrong?
08	Falcon-OCR	unverified	80.3	2026	Paper ↗Code ↗	Looks wrong?
09	PaddleOCR-VL	unverified	80	2025	Paper ↗Code ↗	Looks wrong?
10	Qianfan-OCR	unverified	79.8	2026	Paper ↗Code ↗	Looks wrong?
11	dots.ocr	unverified	79.1	2025	Paper ↗Code ↗	Looks wrong?
12	MinerU2.5	unverified	77.5	2025	Paper ↗Code ↗	Looks wrong?
13	DeepSeek-OCR-2	unverified	76.3	2026	Paper ↗Code ↗	Looks wrong?
14	LightOnOCR-1B-1025	unverified	76.1	2026	Paper ↗	Looks wrong?

Old Scans Math

Old Scans Math is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Old Scans Mathverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	LightOnOCR-2-1B LightOnOCR-2-1B. Old scans with math sub-category.	paper	85.6	2026	Source ↗	Looks wrong?
02	olmocr-v0.4.0 olmOCR 2. Old scans with math sub-category.	paper	82.3	2025	Source ↗	Looks wrong?
03	olmOCR v0.4.0 olmOCR 2. Sub-category: old scans with math.	paper	82.3	2025	Source ↗	Looks wrong?
04	chandra-ocr-0.1.0 Mathematical notation in old scans. #1, leads by 5.4 points	paper	80.3	2025	Source ↗	Looks wrong?
05	Chandra v0.1.0 Mathematical notation in old scans. #1, leads by 5.4 points	unverified	80.3	2025	Source ↗	Looks wrong?
06	olmocr-v0.3.0 #2 on math in old scans	paper	79.9	2025	Source ↗	Looks wrong?
07	olmOCR v0.3.0 #2 on math in old scans	unverified	79.9	2025	Source ↗	Looks wrong?

Pass Rate

Pass Rate is the reported evaluation metric for olmOCR-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Pass Rateverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	dots.mocr dots.mocr (arxiv:2603.13032). Current SOTA on olmOCR-Bench (83.9 ± 0.9). 3B multimodal model.	unverified	83.9	2026	Source ↗	Looks wrong?
02	LightOnOCR-2-1B LightOnOCR-2-1B. SOTA at publication (Jan 2026). 9x smaller than Chandra-9B, 3.3x faster.	paper	83.2	2026	Source ↗	Looks wrong?
03	chandra-ocr-0.1.0 7,010 unit tests across 1,402 PDF documents. #1 overall on olmOCR-Bench.	paper	83.1	2025	Source ↗	Looks wrong?
04	Chandra v0.1.0 7,010 unit tests across 1,402 PDF documents. #1 overall on olmOCR-Bench.	unverified	83.1	2025	Source ↗	Looks wrong?
05	infinity-parser-7b	paper	82.5	2025	Source ↗	Looks wrong?
06	Infinity-Parser 7B	unverified	82.5	2025	Source ↗	Looks wrong?
07	olmOCR v0.4.0	unverified	82.4	2025	Source ↗	Looks wrong?
08	olmocr-v0.4.0	paper	82.4	2025	Source ↗	Looks wrong?
09	paddleocr-vl	paper	80	2025	Source ↗	Looks wrong?
10	Qianfan-OCR Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought.	paper	79.8	2026	Source ↗	Looks wrong?
11	Qwen3-VL-4B Qwen3-VL-4B score from Qianfan-OCR Table 3. General-purpose VLM baseline.	paper	79.2	2026	Source ↗	Looks wrong?
12	PaddleOCR-VL-1.5 PaddleOCR-VL-1.5 (arxiv:2601.21957). Score from Qianfan-OCR comparison table (Table 3). 0.9B params.	paper	79.1	2026	Source ↗	Looks wrong?
13	dots.ocr 3B	unverified	79.1	2025	Source ↗	Looks wrong?
14	dots-ocr-3b	paper	79.1	2025	Source ↗	Looks wrong?
15	mistral-ocr-3 Estimated based on 74% win rate vs OCR 2	paper	78	2025	Source ↗	Looks wrong?
16	Mistral OCR 3 Estimated based on 74% win rate vs OCR 2	unverified	78	2025	Source ↗	Looks wrong?
17	Marker 1.10.0	unverified	76.5	2025	Source ↗	Looks wrong?
18	marker-1.10.0	paper	76.5	2025	Source ↗	Looks wrong?
19	marker-1.10.1	paper	76.1	2025	Source ↗	Looks wrong?
20	Marker 1.10.1	unverified	76.1	2025	Source ↗	Looks wrong?

Lineage

olmOCR-Bench in context.

See full ocr benchmarks lineage →

Predecessors (1)

PDF-focused, harder. olmOCR-Bench targets the failure modes OmniDocBench averages out — old scans, math equations, mixed columns, headers/footers. 'Where does it actually break.'

This benchmark (1)

active2025-03

olmOCR-Bench

Successors (1)

Agent-grade document parsing: enterprise docs (insurance, finance, government), 169K rule-based tests across five orthogonal axes, no LLM-as-judge. The frontier where 'OCR' meets 'agent ingestion pipeline'.

§ 04 · Submit a result

Add to the leaderboard.

← Back to Document Parsing