Document Parsing2024en

olmOCR-Bench

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

Samples:1,402
Metrics:pass-rate, tables, old-scans-math, long-tiny-text, base, headers-footers, multi-column, arxiv, old-scans
Paper / Website
Current State of the Art

Chandra v0.1.0

datalab-to

83.1

pass-rate

Top Models Performance Comparison

Top 10 models ranked by pass-rate

pass-rate1Chandra v0.1.083.1100.0%2Infinity-Parser 7B82.599.3%3olmOCR v0.4.082.499.2%4PaddleOCR-VL80.096.3%5dots.ocr 3B79.195.2%6Mistral OCR 378.093.9%7Marker 1.10.076.592.1%8Marker 1.10.176.191.6%9DeepSeek OCR75.791.1%10DeepSeek OCR75.490.7%0%25%50%75%100%% of best
Best Score
83.1
Top Model
Chandra v0.1.0
Models Compared
10
Score Range
7.7

arxiv

#ModelScorePaper / CodeDate
1
Marker 1.10.0Open Source
VikParuchuri
83.8Dec 2025
2
Chandra v0.1.0Open Source
datalab-to
82.2Dec 2025

base

#ModelScorePaper / CodeDate
1
Chandra v0.1.0Open Source
datalab-to
99.9Dec 2025

headers-footers

#ModelScorePaper / CodeDate
1
olmOCR v0.3.0Open Source
Allen AI
95.1Dec 2025
2
Chandra v0.1.0Open Source
datalab-to
90.8Dec 2025

long-tiny-text

#ModelScorePaper / CodeDate
1
Chandra v0.1.0Open Source
datalab-to
92.3Dec 2025

multi-column

#ModelScorePaper / CodeDate
1
Chandra v0.1.0Open Source
datalab-to
81.2Dec 2025

old-scans

#ModelScorePaper / CodeDate
1
Chandra v0.1.0Open Source
datalab-to
50.4Dec 2025
2
GPT-4oAPI
OpenAI
40.7Dec 2025

old-scans-math

#ModelScorePaper / CodeDate
1
Chandra v0.1.0Open Source
datalab-to
80.3Dec 2025
2
olmOCR v0.3.0Open Source
Allen AI
79.9Dec 2025

pass-ratePrimary

#ModelScorePaper / CodeDate
1
Chandra v0.1.0Open Source
datalab-to
83.1Dec 2025
2
Infinity-Parser 7BOpen Source
82.5Dec 2025
3
olmOCR v0.4.0Open Source
Allen AI
82.4Dec 2025
4
PaddleOCR-VLOpen Source
Baidu
80Dec 2025
5
dots.ocr 3BOpen Source
RedNote HILab
79.1Dec 2025
6
Mistral OCR 3API
Mistral
78Dec 2025
7
Marker 1.10.0Open Source
VikParuchuri
76.5Dec 2025
8
Marker 1.10.1Open Source
VikParuchuri
76.1Dec 2025
9
DeepSeek OCROpen Source
DeepSeek
75.7Dec 2025
10
DeepSeek OCROpen Source
DeepSeek
75.4Dec 2025
11
MinerU 2.5Open Source
OpenDataLab
75.2Dec 2025
12
Mistral OCR 2API
Mistral
72Dec 2025
13
GPT-4o (Anchored)
OpenAI
69.9Dec 2025
14
Nanonets OCR2 3B
Nanonets
69.5Dec 2025
15
Gemini Flash 2
Google
63.8Dec 2025

tables

#ModelScorePaper / CodeDate
1
dots.ocr 3BOpen Source
RedNote HILab
88.3Dec 2025
2
Chandra v0.1.0Open Source
datalab-to
88Dec 2025

Other Document Parsing Datasets