olmOCR-Bench

Allen Institute for AI

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

Benchmark Stats

Models16
Papers28
Metrics9

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

Pass Rate

Percentage of unit tests passed

Higher is better

RankModelCodeScorePaper / Source
1chandra-ocr-0.1.0

7,010 unit tests across 1,402 PDF documents. #1 overall on olmOCR-Bench.

HF83.1%AlphaXiv
2infinity-parser-7b-82.5%AlphaXiv
3olmocr-v0.4.082.4%AlphaXiv
4paddleocr-vl80%AlphaXiv
5dots-ocr-3b79.1%GitHub
6mistral-ocr-3

Estimated based on 74% win rate vs OCR 2

-78%mistral-announcement
7marker-1.10.076.5%GitHub
8marker-1.10.176.1%AlphaXiv
9deepseek-ocr-75.7%AlphaXiv
10deepseek-ocr

Chandra outperforms by 7.7 points

-75.4%GitHub
11mineru-2.575.2%AlphaXiv
12mistral-ocr-api-72%AlphaXiv
13gpt-4o-anchored

GPT-4o with anchored prompting

-69.9%GitHub
14nanonets-ocr2-3b-69.5%AlphaXiv
15gemini-flash-2-63.8%GitHub

tables

Higher is better

RankModelCodeScorePaper / Source
1dots-ocr-3b

#1 on table recognition

88.3GitHub
2chandra-ocr-0.1.0

Table recognition category. Near-best (dots.ocr: 88.3)

HF88GitHub

old-scans-math

Higher is better

RankModelCodeScorePaper / Source
1chandra-ocr-0.1.0

Mathematical notation in old scans. #1, leads by 5.4 points

HF80.3GitHub
2olmocr-v0.3.0

#2 on math in old scans

79.9GitHub

long-tiny-text

Higher is better

RankModelCodeScorePaper / Source
1chandra-ocr-0.1.0

Long documents with tiny text. #1 in category

HF92.3GitHub

base

Higher is better

RankModelCodeScorePaper / Source
1chandra-ocr-0.1.0

Base clean document parsing. Near-perfect

HF99.9GitHub

headers-footers

Higher is better

RankModelCodeScorePaper / Source
1olmocr-v0.3.0

#1 on headers/footers extraction

95.1GitHub
2chandra-ocr-0.1.0

Header/footer extraction

HF90.8GitHub

multi-column

Higher is better

RankModelCodeScorePaper / Source
1chandra-ocr-0.1.0

Multi-column document parsing

HF81.2GitHub

arxiv

Higher is better

RankModelCodeScorePaper / Source
1marker-1.10.0

#1 on ArXiv paper parsing

83.8GitHub
2chandra-ocr-0.1.0

ArXiv paper parsing. Marker leads (83.8)

HF82.2GitHub

old-scans

Higher is better

RankModelCodeScorePaper / Source
1chandra-ocr-0.1.0

Old scan recognition. #1 (GPT-4o: 40.7)

HF50.4GitHub
2gpt-4o

#2 on old scans. Chandra leads by 9.7 points

-40.7GitHub