Document Parsing2024en

OmniDocBench v1.5

981 annotated PDF pages across 9 document categories. Tests end-to-end document parsing including text, tables, and formulas.

Samples:981
Metrics:composite, table-teds, layout-map, ocr-edit-distance
Paper / Website
Current State of the Art

PaddleOCR-VL

Baidu

92.86

composite

Top Models Performance Comparison

Top 10 models ranked by composite

composite1PaddleOCR-VL92.9100.0%2PaddleOCR-VL 0.9B92.699.7%3MinerU 2.590.797.6%4Qwen3-VL-235B89.296.0%5MonkeyOCR-pro-3B88.895.7%6OCRVerse 4B88.695.4%7dots.ocr 3B88.495.2%8Gemini 2.5 Pro88.094.8%9Qwen2.5-VL87.093.7%10Mistral OCR 379.885.9%0%25%50%75%100%% of best
Best Score
92.9
Top Model
PaddleOCR-VL
Models Compared
10
Score Range
13.1

compositePrimary

#ModelScorePaper / CodeDate
1
PaddleOCR-VLOpen Source
Baidu
92.86Dec 2025
2
PaddleOCR-VL 0.9BOpen Source
Baidu
92.56Dec 2025
3
MinerU 2.5Open Source
OpenDataLab
90.67Dec 2025
4
Qwen3-VL-235BOpen Source
Alibaba
89.15Dec 2025
5
MonkeyOCR-pro-3BOpen Source
88.85Dec 2025
6
OCRVerse 4BOpen Source
88.56Dec 2025
7
dots.ocr 3BOpen Source
RedNote HILab
88.41Dec 2025
8
Gemini 2.5 ProAPI
Google
88.03Dec 2025
9
Qwen2.5-VLOpen Source
Alibaba
87.02Dec 2025
10
Mistral OCR 3API
Mistral
79.75Dec 2025
11
mistral-ocr-2512
79.75Dec 2025
12
clearOCRAPI
TeamQuest
31.7Dec 2025

formula-edit-distance

#ModelScorePaper / CodeDate
1
Mistral OCR 3API
Mistral
0.218Dec 2025
2
clearOCRAPI
TeamQuest
0.902Dec 2025

layout-map

#ModelScorePaper / CodeDate
1
MinerU 2.5Open Source
OpenDataLab
97.5Dec 2025

ocr-edit-distance

#ModelScorePaper / CodeDate
1
GPT-4oAPI
OpenAI
0.020Dec 2025

reading-order

#ModelScorePaper / CodeDate
1
Mistral OCR 3API
Mistral
91.63Dec 2025
2
clearOCRAPI
TeamQuest
86.04Dec 2025

table-teds

#ModelScorePaper / CodeDate
1
PaddleOCR-VLOpen Source
Baidu
93.52Dec 2025
2
Mistral OCR 3API
Mistral
70.88Dec 2025
3
clearOCRAPI
TeamQuest
0.800Dec 2025

text-edit-distance

#ModelScorePaper / CodeDate
1
Mistral OCR 3API
Mistral
0.099Dec 2025
2
clearOCRAPI
TeamQuest
0.154Dec 2025

Other Document Parsing Datasets

OmniDocBench Benchmark - Document Parsing | CodeSOTA