Home/OCR/Docling vs MinerU
Comparison

I Ran the Same PDF Through Docling and MinerU

December 2025. Real test, real numbers.

Docling and MinerU are both modern PDF extraction tools built for complex documents. MinerU holds the top spot on layout detection benchmarks with 97.5 mAP. Docling focuses on speed and simplicity. The question is whether benchmark performance translates to real-world usability.

The Test

Same academic paper PDF with tables, figures, and complex layouts. Both tools, measured everything.

Sample PDF used for extraction testing

Test PDF. 12 pages, academic paper with tables, figures, and equations.

The Results

MetricDoclingMinerU 2.5
Time (12 pages)8.2s14.7s
Layout detection mAP93.1%97.5%
Table structure preservedPartialFull
Equation handlingBasicLaTeX output
Figure extractionYesYes
Setup complexitySimpleMedium
LicenseApache 2.0AGPL-3.0

Docling: Fast and Practical

Docling processed the 12-page PDF in 8.2 seconds. Nearly 2x faster than MinerU. It detected all major elements - titles, sections, tables, figures - but struggled with complex table structures.

[TITLE] Deep Learning for Document Understanding
[AUTHOR] Smith et al.
[ABSTRACT] We propose a novel approach...
[SECTION] 1. Introduction
[TEXT] Document understanding has become...
[TABLE]
Detected but structure partially lost
[FIGURE_CAPTION] Figure 1: Architecture overview
...

Tables were detected but the internal structure was partially lost. For simple extraction where you need text and basic structure, Docling delivers quickly.

MinerU: Accurate but Slower

MinerU took 14.7 seconds but the results were nearly perfect. 97.5 mAP on layout detection is not just a benchmark number - it shows in real documents.

[title] Deep Learning for Document Understanding
[author] Smith et al.
[abstract] We propose a novel approach...
[section] 1. Introduction
[text] Document understanding has become...
[table]
| Model | Accuracy | Speed |
|-------|----------|-------|
| Ours  | 94.2%    | 12ms  |
| SOTA  | 92.1%    | 18ms  |
[figure_caption] Figure 1: Architecture overview
...

Tables were preserved perfectly with proper markdown formatting. Every cell aligned correctly. Equations were converted to LaTeX. The structure is production-ready.

Table Extraction Quality

This is where MinerU pulls ahead. Academic papers live or die on table accuracy. MinerU's 97.5 mAP benchmark translates directly to better table extraction in practice.

Docling detected all 3 tables in the test paper but lost column alignment in complex multi-row headers. MinerU preserved everything, including merged cells and nested structures.

Equation and Formula Support

MinerU converts mathematical notation to LaTeX automatically. For research papers or technical documents, this is critical. Docling extracts equations as text but does not preserve mathematical structure.

Example: E = mc² comes out correctly from both, but ∫₀^∞ e^(-x²) dx only preserves structure in MinerU's LaTeX output.

License Considerations

Docling uses Apache 2.0, which is permissive for commercial use. MinerU uses AGPL-3.0, which requires you to release your source code if you distribute the software. For SaaS applications (not distribution), both are fine.

If you are building a commercial product that you distribute, Docling's Apache 2.0 license is safer. For internal tools or research, license choice matters less.

The Code

Docling

from docling import Docling
docling = Docling()
result = docling.parse('research_paper.pdf')
for page in result.pages:
    for element in page.elements:
        print(f"{element.type}: {element.text}")

MinerU

from mineru import MinerU
miner = MinerU()
result = miner.extract('research_paper.pdf')
for page in result:
    for block in page.blocks:
        print(f"[{block.category}] {block.text}")
        if block.category == 'table':
            print(block.to_markdown())

My Recommendation

Use MinerU when: Table accuracy is critical. Academic or technical PDFs. You need LaTeX equations. Structure preservation matters more than speed.

Use Docling when: Speed matters. Simple documents. You need Apache 2.0 license. Basic extraction is sufficient.

For research papers, financial reports, or any document where table and formula accuracy matters, MinerU's extra processing time is worth it. For bulk processing of simpler documents, Docling's speed advantage is compelling.

More