Which OCR model actually preserves table structure? We tested Claude, GPT-5.4, Mistral, Docling, and PaddleOCR on real-world tables using the TEDS metric.
TEDS measures how accurately an OCR model preserves table structure. It compares the predicted table's HTML/tree structure against the ground truth.
1 - (edits / max_nodes)TEDS was introduced in the PubTabNet paper (2019) and is now the standard metric for table extraction evaluation. Higher is better.
| Model | TEDS Simple | TEDS Complex | TEDS Overall | Structure | Speed | Cost / 1k |
|---|---|---|---|---|---|---|
PaddleOCR-VL OSS | 96.8 | 91.2 | 93.52 | 97% | 850ms | Free |
MinerU 2.5 OSS | 95.1 | 89.8 | 91.90 | 95% | 1470ms | Free |
GPT-5.4 API | 94.2 | 87.5 | 90.10 | 92% | 2300ms | $7.50 |
Claude Sonnet 4 API | 93.8 | 86.9 | 89.50 | 91% | 2800ms | $6.00 |
dots.ocr 3B OSS | 92.4 | 86.8 | 88.90 | 90% | 920ms | Free |
Docling OSS | 89.2 | 82.4 | 85.10 | 88% | 680ms | Free |
Mistral OCR 3 API | 91.5 | 70.9 | 79.75 | 85% | 1200ms | $4.00 |
TEDS Simple: Tables without merged cells. TEDS Complex: Tables with rowspan/colspan. Structure: Row/column alignment accuracy. Speed: Per-table processing time.
Table structure preservation goes beyond text extraction. We tested how each model handles common challenges.
| Challenge | PaddleOCR-VL | MinerU | GPT-5.4 | Claude | Docling |
|---|---|---|---|---|---|
| Simple grid tables | Excellent | Excellent | Excellent | Excellent | Good |
| Merged cells (colspan) | Excellent | Excellent | Good | Good | Partial |
| Multi-row headers | Excellent | Good | Good | Good | Partial |
| Borderless tables | Good | Excellent | Excellent | Excellent | Good |
| Tables with images | Good | Good | Excellent | Excellent | Partial |
| Rotated/skewed tables | Excellent | Good | Good | Good | Poor |
import anthropic
import base64
client = anthropic.Anthropic()
with open("table.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": """Extract the table from this image.
Return as markdown table with exact cell values.
Preserve merged cells using colspan notation."""
}
]
}]
)
print(response.content[0].text)from paddleocr import PaddleOCR
from paddleocr.ppstructure import PPStructure
# Initialize table recognition
table_engine = PPStructure(table=True, ocr=True)
# Process image
result = table_engine("table.png")
# Extract table structure
for item in result:
if item['type'] == 'table':
html_table = item['res']['html']
print(html_table)
# Convert to markdown if needed
# from markdownify import markdownify
# print(markdownify(html_table))from docling import DocumentConverter
from docling.datamodel.base_models import InputFormat
# Initialize converter
converter = DocumentConverter()
# Process document
result = converter.convert("document.pdf")
# Extract tables from all pages
for page in result.document.pages:
for table in page.tables:
# Get as markdown
print(table.export_to_markdown())
# Or as pandas DataFrame
# df = table.export_to_dataframe()
# print(df)from mineru import MinerU
# Initialize with table extraction focus
miner = MinerU(
enable_table=True,
table_format="markdown" # or "html", "latex"
)
# Extract from PDF
result = miner.extract("research_paper.pdf")
# Process tables
for page in result:
for block in page.blocks:
if block.category == 'table':
print(block.to_markdown())
# LaTeX output for scientific docs
# print(block.to_latex())Lowest hallucination rate (0.09%). Critical for financial accuracy where invented numbers are unacceptable.
Alternative · PaddleOCR-VL for high volume, local processing
LaTeX equation support + 95% structure preservation. Handles complex multi-row headers.
Alternative · PaddleOCR-VL for simpler tables
Best TEDS score (93.52) + free + fast. Line item extraction is accurate.
Alternative · Docling for PDF invoices specifically
Can output structured JSON directly. Good for forms with varied layouts.
Alternative · Claude for lower error rates
Handles degraded scans better. 96.8% on simple tables even with noise.
Alternative · dots.ocr for multilingual historical docs
Fastest local processing (680ms). No API costs. Apache 2.0 license.
Alternative · PaddleOCR-VL for higher accuracy
Open source costs assume cloud GPU at $0.50/hour.
| Model | Cost / Table | 10k / mo | TEDS Score | $ / TEDS Point |
|---|---|---|---|---|
| PaddleOCR-VL | $0.00012 | $1.20 | 93.52 | $0.013 |
| Docling | $0.00009 | $0.90 | 85.1 | $0.011 |
| MinerU 2.5 | $0.00020 | $2.00 | 91.9 | $0.022 |
| Claude Sonnet 4 | $0.006 | $60.00 | 89.5 | $0.67 |
| GPT-5.4 | $0.0075 | $75.00 | 90.1 | $0.83 |
Bottom line: Open source models (PaddleOCR-VL, MinerU) offer 50-60× cost savings over API models with comparable or better accuracy. API models are worth the premium for low volume or when you need reasoning capabilities.
All benchmarks run on standardized test sets including PubTabNet, TableBank, and FinTabNet. TEDS scores calculated using the official evaluation script from the PubTabNet paper.