Home/Browse/Computer Vision/Document Parsing

CVPR 2025 Benchmark

Parsing Every
Document

OmniDocBench is the most comprehensive benchmark for PDF document parsing, evaluating text extraction, table recognition, formula detection, and layout analysis across 9 diverse document types.

View Leaderboard Understand the Pipeline

Benchmark Stats

981

Annotated Pages

94.6

SOTA Composite Score

Metrics Tracked

Models Evaluated

What is OmniDocBench?

OmniDocBench is a comprehensive document parsing benchmark created by Shanghai AI Laboratory and accepted at CVPR 2025. It evaluates the ability of AI systems to convert PDF documents into structured formats like Markdown, preserving text, tables, formulas, and reading order.

Unlike earlier benchmarks that focus on narrow document types (only academic papers, or only scanned receipts), OmniDocBench covers 9 diverse document categories including academic papers, textbooks, slides, financial reports, newspapers, handwritten notes, exam papers, magazines, and research reports.

The benchmark uses 19 layout categories and 15 attribute labels for multi-level annotation, enabling both end-to-end evaluation and fine-grained task-specific analysis. This makes it the most thorough document parsing evaluation available.

Key Properties

Multi-source Coverage

9 document types from academic to handwritten

Multi-level Annotations

19 layout categories, 15 attribute labels

Composite Scoring

Balanced metric across text, tables, and formulas

Pipeline + VLM Evaluation

Compares traditional pipelines and vision-language models

Open Access

Dataset and evaluation code publicly available on GitHub

The Document Parsing Pipeline

From raw PDF pages to structured Markdown output. Understanding each stage reveals where models succeed and fail.

Input

PDF Document

Raw PDF pages with mixed content: text paragraphs, tables, mathematical formulas, figures, headers, footers, and complex multi-column layouts.

Detection

Layout Analysis

Detect and classify 19 layout elements: text blocks, tables, formulas, figures, titles, headers, footers, page numbers, captions, and more.

Extraction

Content Recognition

Each detected region gets specialized processing: OCR for text, structure recognition for tables (HTML/LaTeX), and LaTeX conversion for formulas.

Output

Structured Format

Final Markdown/HTML output preserving reading order, table structure, formula notation, and document hierarchy. Ready for downstream tasks.

PDF Page→Layout Detection→Text OCR + Table TEDS + Formula LaTeX→Structured Markdown

End-to-end VLMs (like Qwen3-VL, Gemini 2.5 Pro) collapse stages 2-4 into a single forward pass. Pipeline methods (MinerU, PaddleOCR) use specialized models per stage.

OmniDocBench Composite Leaderboard

Composite Score = ((1 - TextEditDist) x 100 + TableTEDS + FormulaCDM) / 3. Higher is better.

Rank	Model	Composite	Source	Notes
1	GLM-OCR	94.62	codesota-api	Fetched from CodeSOTA API on 2026-04-20
2	PaddleOCR-VL	92.86	codesota-api	Fetched from CodeSOTA API on 2026-04-20
3	PaddleOCR-VL 0.9B	92.56	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#4	MinerU 2.5	90.67	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#5	Qwen3-VL 235B	89.15	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#6	MonkeyOCR Pro 3B	88.85	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#7	OCRVerse 4B	88.56	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#8	dots.ocr 3B	88.41	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#9	Gemini 2.5 Pro	88.03	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#10	Qwen2.5-VL	87.02	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#11	Mistral OCR 3Verified	79.75	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#12	Mistral OCR (2512)Verified	79.75	codesota-api	Fetched from CodeSOTA API on 2026-04-20
#13	clearOCR (TeamQuest)Verified	31.70	codesota-api	Fetched from CodeSOTA API on 2026-04-20

Best Scores by Metric

Individual metric leaders across all tracked OmniDocBench dimensions.

Text Edit Distance

Character-level edit distance for OCR accuracy. Lower is better.

GPT-4o0.020

Mistral OCR 30.099

clearOCR (TeamQuest)0.154

Table TEDS

Tree Edit Distance Score for table structure. Higher is better.

PaddleOCR-VL93.52

Qianfan-OCR91.02

Mistral OCR 370.88

Layout mAP

Mean Average Precision for layout detection. Higher is better.

MinerU 2.597.5

Formula Edit Distance

LaTeX formula recognition accuracy. Lower is better.

Mistral OCR 30.218

clearOCR (TeamQuest)0.902

Reading Order

Accuracy of element reading order. Higher is better.

Mistral OCR 391.63

clearOCR (TeamQuest)86.04

Why Document Parsing is Hard

Document parsing sits at the intersection of computer vision (layout detection, figure recognition), NLP (text extraction, reading order), and structured prediction (table/formula reconstruction).

Layout Diversity: Academic papers, newspapers, and slides have radically different layouts
Nested Structures: Tables within tables, formulas within table cells, multi-column text flows
OCR Errors Cascade: A single misread character in a formula renders the entire equation wrong
Language Agnosticism: Documents span dozens of languages with different scripts

The Rise of Vision-Language Models

Traditional document parsing relied on pipeline approaches: separate models for layout detection, OCR, table recognition, and formula detection. Each module could be optimized independently but errors cascaded between stages.

Now, end-to-end VLMs like Qwen3-VL and Gemini 2.5 Pro convert entire pages in a single forward pass. They score competitively on OmniDocBench without any document-specific training.

However, pipeline methods like PaddleOCR-VL and MinerU still hold the top spots, suggesting that specialized architectures remain valuable for structured document understanding.

Understanding the Metrics

Text Edit Distance

Measures character-level accuracy of extracted text against ground truth using normalized Levenshtein distance. A score of 0.02 means only 2% of characters need editing.

Lower is better. Range: 0.0 (perfect) to 1.0 (completely wrong)

Table TEDS (Tree Edit Distance Score)

Evaluates table structure recognition by comparing the predicted HTML/LaTeX table tree against the ground truth tree. Captures both cell content and structural accuracy.

Higher is better. Range: 0 to 100

Layout mAP (Mean Average Precision)

Standard object detection metric applied to document layout elements. Measures how accurately the model detects and classifies text blocks, tables, figures, formulas, etc.

Higher is better. Range: 0 to 100

Formula CDM (Character Detection Matching)

Evaluates mathematical formula recognition by matching detected characters and symbols against ground truth LaTeX. Captures both symbol accuracy and spatial arrangement.

Higher is better. Used in composite score calculation

Composite Score Formula

Composite = ((1 - TextEditDist) × 100 + TableTEDS + FormulaCDM) / 3

This balanced formula ensures models must excel at all three core tasks. A model strong at OCR but weak at tables will be penalized.

Related Benchmarks Comparison

How OmniDocBench compares to other document understanding benchmarks.

Benchmark	Focus	Documents	Doc Types	Key Metric	Year
OmniDocBench	End-to-end parsing	981	9 categories	Composite (Text + Table + Formula)	2024
DocLayNet	Layout detection	80,863	6 categories	mAP@0.5	2022
PubLayNet	Layout detection	360,000+	Academic papers	mAP	2019
olmOCR-Bench	PDF extraction	1,402	Mixed PDFs	Pass Rate (unit tests)	2025
OCRBench v2	OCR capabilities	10,000+	23 task types	Overall Score	2024
TableBank	Table detection	417,234	Academic papers	F1 Score	2019
CC-OCR	Multi-scene OCR	-	4 task domains	F1 Score	2024

OmniDocBench is unique in evaluating the full end-to-end parsing pipeline (text + tables + formulas + layout) on diverse document types, rather than focusing on a single sub-task.

Dataset Access

GitHub Repository

Source code, evaluation scripts, and benchmark data. Open source under Apache 2.0.

Research Paper

arXiv:2412.07626 -- Full methodology, annotation guidelines, and baseline results. Accepted at CVPR 2025.

AlphaXiv Leaderboard

Official live leaderboard with the latest model submissions and verified scores.

Lineage

OmniDocBench in context.

See full ocr benchmarks lineage →

Predecessors (1)

active2024-12

OCRBench v2

From character-level recognition to document-level fidelity. OmniDocBench scores layout, tables, formulas and reading order — answering 'did the system reconstruct this document' rather than 'did it read each glyph'.

This benchmark (1)

active2024-12

OmniDocBench

Successors (2)

active2025-03

olmOCR-Bench

PDF-focused, harder. olmOCR-Bench targets the failure modes OmniDocBench averages out — old scans, math equations, mixed columns, headers/footers. 'Where does it actually break.'

active2026-03

OCR · CER

Strips OmniDocBench's composite back down to character-error-rate on the same hold-out so vendor claims can be reproduced in isolation. CodeSOTA's verified column.

Explore More OCR Content

Verified Model Reviews

Related Benchmarks

Have OmniDocBench Results?

If you have run your model on OmniDocBench and want to be listed on this leaderboard with verified results, submit your scores for independent verification.

Submit Results Explore OCR Hub

Parsing Every Document

Benchmark Stats

What is OmniDocBench?

Key Properties

The Document Parsing Pipeline

PDF Document

Layout Analysis

Content Recognition

Structured Format

OmniDocBench Composite Leaderboard

Best Scores by Metric

Text Edit Distance

Table TEDS

Layout mAP

Formula Edit Distance

Reading Order

Why Document Parsing is Hard

The Rise of Vision-Language Models

Understanding the Metrics

Text Edit Distance

Table TEDS (Tree Edit Distance Score)

Layout mAP (Mean Average Precision)

Formula CDM (Character Detection Matching)

Composite Score Formula

Related Benchmarks Comparison

Dataset Access

GitHub Repository

Research Paper

AlphaXiv Leaderboard

OmniDocBench in context.

Explore More OCR Content

Verified Model Reviews

Related Benchmarks

Have OmniDocBench Results?

Parsing Every
Document