Which OCR stack minimizes
manual review cost?
Independent, reproducible, failure-mode-level evaluation
For teams choosing OCR for production documents. EU languages, real scans, actual breakage patterns.
90-Second Clarity
Who is this for?
Teams choosing OCR for production documents. EU languages (Polish, German, Czech), real scans with stamps and noise, regulatory compliance requirements.
What decision does this help?
Which OCR stack minimizes manual review cost. Not "best accuracy" but "least expensive failures for your document type."
Why trust this?
100% independent, no vendor investment. We run our own benchmarks on real documents. GDPR compliant - data stays in EU.
Next action?
Read failure taxonomy below. Pick document type. Request private evaluation on your documents.
What Actually Breaks
Forget accuracy percentages. Here's what fails in production and which models handle it.
Dropped Diacritics
Polish ą, ę, ó → a, e, o. German ä, ö, ü → a, o, u. Czech ř, ů → r, u. Changes legal meaning.
Column Bleed (Multi-column Layouts)
Two-column PDFs read as "Line from left | Line from right | Next left | Next right". Destroys paragraph structure. Common in contracts, scientific papers.
Numeric Substitution
8 → B, 0 → O, 1 → l (lowercase L), 5 → S. Fatal for invoice totals, account numbers, tax IDs. Low-quality scans amplify this.
Header/Footer Hallucination
Page numbers, watermarks, "COPY" stamps read as content. "Page 3 of 12" appears mid-paragraph. Clutters extracted text, breaks search.
Table Structure Collapse
Tables read as linear text. Loses row/column relationships. Invoice line items become gibberish. Measured by TEDS (Table Edit Distance).
Stamp/Signature Interference
Circular stamps, handwritten signatures overlay printed text. OCR reads both, creating garbage. "APPROVED" stamp corrupts underlying sentence.
If Your Priority Is...
Choose your constraint, see recommended models with honest tradeoffs.
Privacy (GDPR, data residency, no cloud)
Medical records, legal documents, customer PII. Must process on-premise or EU cloud only.
Best: PaddleOCR-VL 0.9B
Open-source, runs on-premise. Strong table handling.
Tradeoff: Lower accuracy than GPT-4o on complex layouts
Alternative: Tesseract + post-correction
100% local, battle-tested, free.
Tradeoff: Needs language-specific tuning, poor on tables
Cost (processing millions of pages)
Scanning archives, digitizing libraries, high-volume automation.
Best: PaddleOCR (open-source)
Zero per-page cost. Good multilingual support.
Tradeoff: Worse than VLMs on handwriting, complex tables
Alternative: Mistral OCR 3 (batch)
$1/1000 pages with batch API. Fast inference (1.2 pages/sec).
Tradeoff: Still costs money at scale, API dependency
Table Extraction (invoices, reports, structured data)
Need to preserve row/column relationships. Extract line items, financial tables.
Best: PaddleOCR-VL
88.56 TEDS on OmniDocBench. HTML/Markdown table output.
Tradeoff: Requires GPU for good speed
Alternative: dots.ocr 3B
Best table TEDS among 3B models. Compact.
Tradeoff: Newer model, less battle-tested than PaddleOCR
Handwriting (forms, notes, signatures)
Handwritten forms, doctor's notes, survey responses. Cursive and messy text.
Best: CHURRO (3B)
70.1 Levenshtein on handwritten CHURRO-DS. Historical docs.
Tradeoff: Specialized for historical handwriting, may overkill modern forms
Alternative: GPT-4o
Strong handwriting support, multimodal context helps.
Tradeoff: API cost, slower than specialized models
Multi-language (40+ languages, diacritics, non-Latin)
Polish, German, Czech, Arabic, Thai, Korean. Mixed-language documents.
Best: Gemini 2.5 Pro
Tops OCRBench v2 Chinese, KITAB-Bench Arabic, MME-VideoOCR.
Tradeoff: API-only, cost for high volume
Alternative: Chandra OCR 0.1.0
40+ languages, open-source, strong on old scans.
Tradeoff: 9B params, slower inference
Speed (real-time processing, low latency)
Document upload flows, real-time data entry, mobile scanning apps.
Best: Mistral OCR 3
1.22 pages/sec verified by CodeSOTA. Good accuracy.
Tradeoff: API dependency, cost per page
Alternative: PaddleOCR (GPU)
Very fast on GPU, open-source, no API latency.
Tradeoff: Requires GPU infrastructure, setup complexity
Document Type Quick Guide
Recommendations by document category with specific failure modes to watch.
Invoices & Receipts
Critical: Table structure, numeric accuracy, VAT/tax fields.
Contracts & Legal
Critical: Diacritics (name accuracy), column layout, stamps.
Scientific PDFs
Critical: Formulas, multi-column, figures, citations.
Forms with Handwriting
Critical: Mixed print/handwriting, field extraction, checkboxes.
ID Documents
Critical: Diacritics, security features, photo interference.
Low-quality Scans/Fax
Critical: Noise handling, numeric errors, degraded text.
Private OCR Evaluation
We run the same benchmark on your documents
What you get:
- OCR benchmark on your actual documents (PDF, scans, images)
- Failure-mode analysis: which errors you'll see in production
- Model recommendations ranked by manual review cost for your docs
- GDPR compliant: data processed in EU, deleted after report delivery
- Runnable code + deployment guide for top-ranked model
Early access. No spam. Unsubscribe anytime.
No pricing yet - we're validating the format with early users.
Shape: 100-page sample → failure analysis + model ranking → ~1 week turnaround.
Why Trust This Guide?
100% Independent
No vendor investment, no affiliate links, no sponsored rankings. We make money from private evaluations, not OCR vendors.
GDPR Compliant
Data stays in EU. Private evaluations processed on EU servers, deleted after delivery. No US cloud providers for sensitive docs.
Open Methodology
All benchmarks documented. Read our methodology or see raw data.
Ready to choose your OCR stack?
Request a private evaluation on your documents, or start with our public benchmarks.