Home/OCR/Benchmarks/OmniDocBench

OmniDocBench

Shanghai AI Laboratory

Comprehensive benchmark for evaluating PDF document parsing models across diverse document types with multi-level annotations.

23
Total Results
13
Models Tested
7
Metrics
2025-12-21
Last Updated

Composite Score

((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3

Higher is better

RankModelScoreSource
1paddleocr-vl

End-to-end document parsing. Score = ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3

92.86alphaxiv-leaderboard
2paddleocr-vl-0.9b92.56alphaxiv-leaderboard
3mineru-2.590.67alphaxiv-leaderboard
4qwen3-vl-235b89.15alphaxiv-leaderboard
5monkeyocr-pro-3b88.85alphaxiv-leaderboard
6ocrverse-4b

4B parameter model. Text Edit: 0.058, Formula CDM: 86.91, Table TEDS: 84.55

88.56github-leaderboard
7dots-ocr-3b

3B parameter model. Text Edit: 0.048, Formula CDM: 83.22, Table TEDS: 86.78

88.41github-leaderboard
8gemini-25-pro88.03alphaxiv-leaderboard
9qwen25-vl87.02alphaxiv-leaderboard
10mistral-ocr-2512

Same as mistral-ocr-3. Model alias for mistral-ocr-2512. Text: 90.1%, Tables: 70.9%, Formula: 78.2%.

79.75codesota-verified
11mistral-ocr-3

INDEPENDENTLY VERIFIED by CodeSOTA. Full benchmark run on 1355 images. Text Edit: 0.099 (90.1%), Formula Edit: 0.218 (78.2%), Table TEDS: 70.9%. Reading Order: 91.6%.

79.75codesota-verified
12clearocr-teamquest

INDEPENDENTLY VERIFIED by CodeSOTA. Traditional OCR - text only, no table/formula recognition. Text Edit: 0.154 (84.6%), Table TEDS: 0.8%, Formula Edit: 0.902.

31.7codesota-verified

text-edit-distance

Higher is better

RankModelScoreSource
1clearocr-teamquest

Text block recognition. 84.6% accuracy. Best on research reports (95.4%), academic papers (95.0%).

0.15codesota-verified
2mistral-ocr-3

Text block recognition. 90.1% accuracy. Best on academic papers (97.9%), exam papers (92.8%).

0.10codesota-verified

Table TEDS

Tree Edit Distance Score for table recognition

Higher is better

RankModelScoreSource
1paddleocr-vl

Table structure recognition score (TEDS)

93.52alphaxiv-leaderboard
2mistral-ocr-3

Table structure recognition. TEDS Structure: 75.3%. Best on exam papers (88.0%).

70.88codesota-verified
3clearocr-teamquest

No structured table recognition. Outputs tables as plain text.

0.80codesota-verified

formula-edit-distance

Higher is better

RankModelScoreSource
1clearocr-teamquest

No LaTeX formula recognition. Outputs formulas as plain text.

0.90codesota-verified
2mistral-ocr-3

Display formula recognition. 78.2% accuracy.

0.22codesota-verified

reading-order

Higher is better

RankModelScoreSource
1mistral-ocr-3

Reading order accuracy. 8.4% edit distance error.

91.63codesota-verified
2clearocr-teamquest

Reading order accuracy. 14.0% edit distance error.

86.04codesota-verified

OCR Edit Distance

Character-level edit distance for text extraction

Lower is better

RankModelScoreSource
1gpt-4o

OCR Edit Distance (lower is better). Best on English text extraction.

0.02alphaxiv-leaderboard

Layout mAP

Mean Average Precision for layout detection

Higher is better

RankModelScoreSource
1mineru-2.5

Layout detection mAP (highest)

97.5alphaxiv-leaderboard

Explore More OCR Content