Codesota · OCR · Vol. IIThe most commercially relevant benchmark category on the siteUpdated: June 16, 2026
§ 00 · The verdict

The best OCR model in 2026 is PaddleOCR-VL-1.6.

For full-document parsing — the OCR job most teams actually ship — PaddleOCR-VL-1.6 leads OmniDocBench at 96.33, with MinerU2.5-Pro a hair behind at 95.69. There is no single winner across every job, so the table is the honest answer — each row is the current leader on its benchmark, with the date the score was recorded.

Scores pulled from the CodeSOTA registry (Papers With Code + vendor cards + official leaderboards + our own runs). Newest entry dated 2026-06-16. Note: OmniDocBench composite mixes v1.5/v1.6 reports and most per-model figures are vendor self-reported — treat as a prior, then run a private eval.
The jobBest modelScoreAs of
Full-document parsing
OmniDocBench
PaddleOCR-VL-1.696.33
composite
2026-06-16
PDF → Markdown
olmOCR-Bench
infinity-parser2-pro87.6
pass rate
2026-05-15
Raw text recognition
OCRBench
qwen3-5-397b-a17b931
/ 1000
2026-05-18
Visual text reasoning (EN)
OCRBench v2 EN
KDL Frontier68.1
overall
2026-06-16
§ 00½ · How this benchmark works

A live OCR benchmark, not a vibe check.

Every OCR and document-AI model we can find a reproducible score for, ranked on the standard benchmarks — OmniDocBench, OCRBench, OCRBench v2, olmOCR-Bench and more. Each score is dated and links back to its source.

191 models on 17 benchmarks, 327 scored runs. Every number is drawn from benchmarks.json and scored through lib/scoring; nothing is interpolated, every score dated.

§ 01 · Benchmark surface

Open-weight OCR, ranked.

Models with downloadable weights and OCR/document scores in the registry. This table is filtered to OCR and document-AI benchmarks only; generic vision, medical, RL and code rows are excluded from this page.


Models
77
Benchmarks
OmniDoc · OCRBench · olmOCR
Source
benchmarks.json
Open weights · 77 models
Shaded row marks first row · sorted by best available OCR/document score
#ModelVendorLicenseOmniDocOCRBenchOCRBench ENolmOCRTrust
01PaddleOCR-VL 1.6BaiduApache 2.096.33vendor
02MinerU2.5-ProOpenDataLabApache 2.095.69vendor
03PaddleOCR-VL 1.5BaiduApache 2.094.5079.1paper
04Qianfan-OCRBaiducheck model card93.1288056.0%79.8verified
05FireRed-OCR-2BFireRedTeamcheck model card92.94verified
06PaddleOCR-VLBaiduApache 2.092.8680.0paper
07PaddleOCR-VL 0.9BBaiduApache 2.092.56paper
08Qwen3-VL-235B-A22B-InstructAlibabacheck model card920verified
09SenseNova-U1-A3B-MoTSenseTimecheck model card919verified
10DeepSeek-OCR-2DeepSeekcheck model card91.0976.3verified
11Qwen3.5-397B-A17BAlibabacheck model card90.80931verified
12MinerU 2.5OpenDataLabAGPL-3.090.6775.2verified
13InternVL3-78BShanghai AI Labcheck model card906verified
14Qwen3.6-35B-A3BAlibabacheck model card900verified
15Qwen3-VL-8B-InstructAlibabacheck model card896verified
16Qwen3.6-27BAlibabacheck model card894verified
17Qwen3-VL-235BAlibabaQwen License89.15paper
18MonkeyOCR-pro-3BResearchcheck model card88.8575.8verified
19Kimi K2.5Moonshot AIcheck model card88.80923verified
20Falcon-OCRTIIcheck model card88.6480.3verified
21OCRVerse 4BUnknowncheck model card88.56paper
22Qwen2.5-VL-72BAlibabacheck model card885verified
23dots.ocr 3BRedNote HILabApache 2.088.4179.1paper
24Qwen2-VL-72BAlibabacheck model card877verified
25Infinity-Parser2-ProInflycheck model card86287.6verified
26MiniCPM-o 4.5-InstructOpenBMBcheck model card876verified
27Qwen3-VL-235B-A22B-ThinkingAlibabacheck model card875verified
28MonkeyOCR-3BResearchcheck model card87.13verified
29Qwen2.5-VLAlibabaApache 2.087.02paper
30MonkeyOCR-pro-1.2BResearchcheck model card86.96verified
31Kimi-VL-A3B-Thinking-2506Moonshot AIcheck model card869verified
32PP-StructureV3Baiducheck model card86.73paper
33Kimi-VL-A3B-InstructMoonshot AIcheck model card867verified
34Qwen2-VL-7BAlibabacheck model card866verified
35MiniMax-VL-01MiniMaxcheck model card865verified
36DeepSeek-OCRDeepSeekcheck model card86.4675.7paper
37Qwen2.5-VL-7BAlibabacheck model card864verified
38HunyuanOCR (1B)Tencentcheck model card860verified
39Chandra 2datalab-tocheck model card85.9verified
40dots.mocrRedNote HILabcheck model card86083.9verified
41LightOnOCR-2-1BLightOncheck model card83.2verified
42Chandra v0.1.0datalab-toApache 2.083.1paper
43Chandradatalab-tocheck model card83.1verified
44MiniCPM-V 4.6-Thinking (16x)OpenBMBcheck model card831verified
45VideoLLaMA3 7BUnknowncheck model card828verified
46Infinity-Parser 7BUnknowncheck model card82.5verified
47olmOCR v0.4.0Allen AIApache 2.082.4paper
48olmOCRUnknowncheck model card81.7975.5verified
49Qwen2-VL-2BAlibabacheck model card809verified
50ZAYA1-VL-8BUnknowncheck model card798verified
51Qwen2.5-VL-3BAlibabacheck model card797verified
52dots.ocrRedNote HILabcheck model card79.1verified
53VideoLLaMA3 2BUnknowncheck model card779verified
54Marker 1.10.0VikParuchuricheck model card76.5paper
55Marker 1.10.1VikParuchuricheck model card76.1paper
56DeepSeek OCRDeepSeekcheck model card75.7paper
57MiniCPM-Llama3-V 2.5OpenBMBcheck model card725verified
58FireRed-OCRFireRedTeamcheck model card70.2verified
59Nemotron 3 Nano Omni 30BNVIDIANVIDIA Open Model65.8%verified
60Ovis2.5-9BAIDCcheck model card87963.4%verified
61Qwen3-Omni-30BAlibabaQwen License61.3%paper
62Nemotron Nano V2 VLNVIDIANVIDIA Open Model License61.2%paper
63Intern-S1-ProShanghai AI Labcheck model card60.1%verified
64Qwen2.5-VL 72BAlibabaApache 2.0missing
65InternVL2-76BShanghai AI LabMITmissing
66TesseractGoogle (Open Source)Apache 2.0missing
67EasyOCRJaidedAIApache 2.0missing
68olmOCR v0.3.0Allen AIcheck model cardmissing
69Qwen2-VL 72BAlibabacheck model cardmissing
70Qwen2.5-VL 32BAlibabacheck model cardmissing
71AIN 7BResearchcheck model cardmissing
72PaddleOCRBaiducheck model cardmissing
73InternVL3 14BOpenGVLabcheck model cardmissing
74DeepSeek-OCR (Gundam-M)DeepSeekcheck model cardmissing
75DeepSeek-OCR (Small, 100 vision tokens)DeepSeekcheck model cardmissing
76Qwen3.5-9BAlibabaApache 2.0missing
77MiniCPM-o-4.5OpenBMBApache 2.0missing
Fig 1 · Open-weight OCR/document models on OmniDocBench, OCRBench, OCRBench v2 and olmOCR. Empty cells mean no reproducible score is in the registry yet. Trust badges are result-level: verified, paper, vendor, community or missing.
§ 02 · Vendor surface

APIs and closed endpoints, ranked.

Enterprises still pay for SLAs, compliance, audit logs, regional hosting and support. This table is intentionally separate from open weights; API endpoints, frontier VLMs and closed commercial OCR systems are not labeled open source.


List prices vary by region and volume. The fair cost formula is instance price per hour divided by pages per hour, plus storage, orchestration, retry rate and human review. For self-hosted comparisons see our economics essay.

Vendor APIs · 18 endpoints
Sorted by OmniDoc composite
#VendorProviderOmniDocOCRBench ENolmOCRTrustPrice / 1K
01Gemini 2.5 ProGoogle88.0359.3%papervaries
02Mistral OCR 3Mistral79.7578.0verifiedvaries
03Mistral OCR 2Mistral72.0papervaries
04GPT-4o (Anchored)OpenAI69.9papervaries
05Nanonets OCR2 3BNanonets69.5papervaries
06KDL FrontierKDL68.1%verifiedvaries
07Gemini Flash 2Google63.8papervaries
08Gemini 3 Pro PreviewGoogle63.4%verifiedvaries
09Seed1.6-visionByteDance62.2%papervaries
10TeleMM-2.0TeleAI61.8%verifiedvaries
11GPT-4oOpenAI55.5%papervaries
12GPT-4o MiniOpenAI44.1%papervaries
13Claude Sonnet 4Anthropic42.4%papervaries
14clearOCRTeamQuest31.70verifiedvaries
15Gemini 2.0 FlashGooglemissingvaries
16Gemini 1.5 ProGooglemissingvaries
17Claude 3.5 SonnetAnthropicmissingvaries
18Azure OCRMicrosoftmissingvaries
Fig 2 · Same three benchmarks as Fig 1. Endpoint names read from the models.json registry. Price shown as “varies” where the vendor ties it to model tier / volume — see their pricing page.
§ 03 · Consistency

Cross-benchmark champions.

A single high score can be a training-set artefact. The models below place in the top-three across multiple benchmarks — a harder, more honest bar.

Ranked by number of top-3 finishes, then by average rank, across 8 OCR benchmarks in the registry.

#ModelCoverageTop-3sAvg rankPer-benchmark rank
01Gemini 2.5 Pro4 / 82#8.8OmniDoc #21OCRBench v2 EN #11VideoOCR #1Thai #2
02Gemini 1.5 Pro2 / 81#3.0CC-OCR #1VideoOCR #5
03Qwen2.5-VL 72B2 / 81#3.5VideoOCR #2Thai #5
04InternVL3-78B2 / 81#4.5OCRBench #6VideoOCR #3
05Qwen2.5-VL 32B2 / 81#4.5VideoOCR #6Thai #3
06Qwen3.5-397B-A17B2 / 81#6.0OmniDoc #11OCRBench #1
07GPT-4o4 / 81#6.3OCRBench v2 EN #14CC-OCR #4VideoOCR #4Arabic #3
08Ovis2.5-9B2 / 81#7.5OCRBench #12OCRBench v2 EN #3
Fig 3 · Minimum coverage threshold: 2 benchmarks. Per-benchmark pills are copper when the model is top-3 on that benchmark.
§ 04 · Figure

Twelve models, eight benchmarks.

A single grid to read coverage at a glance. PaddleOCR-VL and Gemini 2.5 Pro show the broadest reach; specialist systems light up a single column.

Rendered from the same registry as the tables above; green indicates higher normalised score within the benchmark.

Heatmap showing OCR model performance across 8 benchmarks. Green = high score. PaddleOCR-VL and Gemini 2.5 Pro show broadest coverage.
Fig 4 · Twelve OCR models × eight benchmarks. Each cell is normalised within its column. Greyed cells: no reproducible score in registry.
Horizontal bar chart comparing top 10 OCR models by OmniDocBench composite score.
Fig 5 · Top-10 by OmniDocBench composite.
Cost comparison chart for OCR systems. Exact self-hosted cost depends on hardware price, throughput, utilization, retries, and review rate.
Fig 6 · Price per 1,000 pages is a parameterized estimate: GPU or API cost, pages/hour, utilization, retries, orchestration and human review all change it.
§ 04½ · Task families

Which benchmark answers which OCR question?

The leaderboards above rank document parsing. These are the other OCR output contracts — each maps to a different benchmark family and a different sub-page.

§ 03½ · Ontology

Which benchmark answers which question?

A model is only SOTA for a use case when the benchmark output contract matches the production output contract.

Benchmark classBenchmarksMeasuresDoes not measure
Document parsingOmniDocBench, olmOCR-Bench, ParseBenchlayout, tables, formulas, reading orderprivate invoice/KIE reliability
Visual text reasoningOCRBench, OCRBench v2, CC-OCRtext localization plus reasoningcost, throughput, structured extraction
Scene textICDAR, Total-Text, COCO-Texttext in natural imagesPDFs, invoices, tables
HandwritingIAM, RIMES, Polish EMNISThandwriting CER and WERforms, layout, field extraction
TablesPubTabNet, FinTabNet, TableBanktable structure and cell F1full document parsing
Forms / KIESROIE, CORD, FUNSD, Kleisterkey-value extraction and schema fieldsfree-form full-page OCR quality
DocVQADocVQA, InfographicVQA, MP-DocVQAanswer correctnessfaithful full extraction
Multilingual OCRKITAB-Bench, ThaiOCRBench, PolEvallanguage coverage, script-specific CERgeneral layout robustness
Fig 3b · Benchmark ontology used by the router. OmniDocBench-style scores above 94 are useful, but small deltas should be treated as benchmark-saturated until private evals confirm the difference.
Traffic path · CodeSOTA search intent → Hardparse product

Looking for OCR? Try the parser, then inspect the layout.

CodeSOTA attracts people comparing OCR models. Hardparse turns that intent into a working document parser: upload one file here, get Markdown plus layout boxes from the same Hardparse API.

Open hardparse.comFree OCR service · no CodeSOTA-side upload limit

For teams that want OCR backed by current SOTA models, API access, private documents, or volume parsing.

Hardparse response
Sample layout
Sample boxes shown before a document is uploaded.
Page 1 · 4 layout regionsboxes + reading order
§ 05
How it works

Three stages, one forward pass.

Classical OCR is a pipeline of three modules. First a detector draws boxes around text regions; then a recogniser reads the pixels inside each box into characters; finally a post-processor corrects the output and resolves reading order. Each module can fail independently, and the errors compound.

Detection granularity has shifted from words to lines to whole regions. Word-level detection — the CRAFT / EAST tradition — still dominates scene text. Line-level dominates documents. Region-level detection is where modern vision-language models thrive: they see entire paragraphs as semantic units and preserve layout without a separate analysis step.

Recognition used to be CTC — Connectionist Temporal Classification — which is fast but treats each character as independent. Attention-based decoders, standard since 2018, let the model condition each character on the whole image. That is why modern OCR finally stops confusing “rn” with “m” and “l” with “1”.

Post-processing is the unsexy part: language-model correction (“teh” to “the”), layout analysis (read left column before right), table structure recognition (scored by TEDS), and confidence filtering. It is also where traditional pipelines most often embarrass themselves.

The 2023–2026 shift is that vision-language models fold all three stages into a single forward pass. They read the document the way a literate human does — as one object, with layout, structure and language considered at once. Traditional OCR is no longer SOTA for complex document understanding, but Tesseract, EasyOCR and classic PaddleOCR still matter for CPU-only, air-gapped, deterministic and simple high-throughput scans.

§ 06 · History

The short version.

OCR evolved from template matching to CRNN/CTC recognizers, transformer OCR and now VLM document parsers. The relevant 2026 shift is that OCR often means document understanding: Markdown, tables, formulas, layout, extraction and evidence, not just characters.

Traditional OCR remains competitive for constrained, cheap, deterministic and high-throughput clean text. VLM OCR wins when the output contract includes reading order, table structure, formulas, messy scans or schema extraction.

Read full OCR history
§ 07 · Decision tools

What are you trying to extract?

Pick the document type. Each link goes to a dedicated page with setup instructions, failure modes and a working code sample.

  1. Scenario · 01

    Invoices & receipts

    Line items, totals, vendor info → structured data. Table-heavy. Receipts fade and crumple.

    PaddleOCR-VL-1.5free · local
  2. Scenario · 02

    Handwritten notes

    Forms, signatures, meeting notes, historical documents. Variable slant, irregular spacing.

    TrOCRfree · local
  3. Scenario · 03

    PDFs & reports

    Multi-page documents, multi-column layout, tables, headers, footnotes.

    Chandra / olmOCRfree · local
  4. Scenario · 04

    Photos & screenshots

    Camera captures, screen grabs, social media imagery — often rotated, sometimes warped.

    PaddleOCR-VL-1.5free · local
  5. Scenario · 05

    Scanned books & archives

    Digitise printed text, old documents, historical archives with degraded print.

    PaddleOCR-VL-1.6 / MinerU2.5-Profree · local
  6. Scenario · 06

    ID cards & passports

    KYC verification, identity documents, MRZ code reading. Compliance and audit matter.

    Azure / Googleenterprise
§ 07½ · Private eval

Public SOTA is only the prior.

The real winner is the model that wins on your documents under your output contract.

Minimum OCR eval
  1. Use at least 50 documents; 200+ is better before procurement or migration.
  2. Include scans, photos, rotated pages, low DPI, handwriting, stamps, tables, formulas and multi-column PDFs.
  3. Score text CER, reading order, table TEDS, field F1, hallucinated text rate, missing block rate, page latency and cost/page separately.
  4. Do not average everything into one number unless the weights are visible.
§ 08 · Long form

Deep dives & techniques.

  1. Essay
    $/1K

    The OCR economics shift

    Self-hosted VLM-OCR can beat API economics at scale, but only under explicit assumptions about throughput, utilization, retries and review rate.

  2. Architecture

    How Docling works

    The architecture of IBM’s document-understanding library and why VLM pipelines outperform traditional OCR.

  3. Engineering

    Interactive OCR correction

    Handling OCR “flicker” (H vs N) and camera drift in mobile apps. Google MLKit plus centroid anchoring.

  4. Reference
    26

    Benchmarks directory

    26 OCR benchmarks across document parsing, handwriting, video OCR, scene text and multilingual tasks.

  5. Case study
    PL

    Rys OCR — Polish SOTA

    71% CER reduction on Polish diacritics. LoRA fine-tune of PaddleOCR-VL. Apache 2.0.

  6. Tutorial

    Ship it

    3 questions, one recommendation, copy-paste code that runs in 10 minutes. For engineers who picked a model and need to wire it in.

§ 09 · Testing priority

What we still need to verify.

Generated by generateTestingPriorityList() in lib/scoring. Ranks outstanding (model, benchmark) pairs by importance weight — coverage gaps first, then benchmark criticality.

If you are planning to run one of these — see the submission note below.

#ModelBenchmarkReasonWeightType
01InternVL3-78BomnidocbenchHigh performer (89.1) needs OmniDoc10.8OSS
02InternVL3-78Bocrbench-v2High performer (89.1) needs OCRBench10.8OSS
03InternVL3-78Bolmocr-benchHigh performer (89.1) needs olmOCR10.8OSS
04Qwen2.5-VL 72BomnidocbenchPrimary benchmark missing7.2OSS
05Qwen2.5-VL 72Bocrbench-v2Primary benchmark missing7.2OSS
06Qwen2.5-VL 72Bolmocr-benchPrimary benchmark missing7.2OSS
07Qwen2.5-VL 32BomnidocbenchPrimary benchmark missing7.2OSS
08Qwen2.5-VL 32Bocrbench-v2Primary benchmark missing7.2OSS
09Qwen2.5-VL 32Bolmocr-benchPrimary benchmark missing7.2OSS
10InternVL3-78Bcc-ocrHigh performer (89.1) needs CC-OCR7.2OSS
Fig 7 · Top-10 testing priorities computed from the registry. Each row is a (model, benchmark) pair we have not independently verified yet. Sparkline is illustrative.

Run any of these OCR models?

Send us your numbers, or flag ones we got wrong. We verify and credit every contribution.

Share results →
§ 00½ · Find the OCR answer

Past the verdict, route to your exact job.

Library comparison

PaddleOCR vs EasyOCR

Speed, setup, accuracy, and when to use each Python OCR library.

Classic baseline

PaddleOCR vs Tesseract

CPU baseline vs modern OCR pipeline, with practical deployment tradeoffs.

Use case

Best handwriting OCR

GPT, Claude, Gemini, TrOCR, DTrOCR, and Azure Read for handwriting OCR.

Leaderboard

OCR benchmark pages

Task-specific OCR benchmark pages for documents, tables, handwriting, and VLM OCR.

§ 10 · Contribute

Know an OCR result
we’re missing?

Fresh numbers, stale data, a model we haven’t tested — tell us. Real humans read every message, and every verified result gets attribution in the registry.

Spotted a number that looks wrong?Tell us →
§ 11 · FAQ

Frequently asked, honestly answered.

Questions that arrive in our inbox every week, answered with real numbers drawn from the tables above.

Q01What is the best OCR model in 2026?+

For full-document parsing, the most common production OCR job, PaddleOCR-VL-1.6 leads OmniDocBench v1.6 at 96.33 (May 2026), with MinerU2.5-Pro at 95.69 and GLM-OCR around 95 — all open-weight VLMs under 1.3B params that beat far larger frontier models. For PDF-to-Markdown, Infinity-Parser2-Pro and Chandra 2 lead olmOCR-Bench at 87.6 and 85.9; for raw text recognition, Qwen3.5-397B-A17B tops OCRBench at 931; on the official OCRBench v2 English leaderboard, KDL Frontier leads at 68.1 with Nemotron-3-Nano-Omni-30B the top open model at 65.8. Pick by the job you are shipping.

Q02Which OCR model has the best English text recognition?+

On classic OCRBench, the imported Papers With Code snapshot is led by Qwen3.5-397B-A17B at 931 points. On the official OCRBench v2 English leaderboard (2026.03), KDL Frontier leads at 68.1 (closed); the top open-weight model is NVIDIA Nemotron-3-Nano-Omni-30B at 65.8, ahead of Gemini 3 Pro Preview and Ovis2.5-9B at 63.4.

Q03Are open-weight OCR models better than paid APIs in 2026?+

Sometimes. Open-weight VLM OCR can beat paid APIs on public document-parsing benchmarks and can be cheaper at scale, but APIs may still win on SLA, compliance, region, audit logs, latency guarantees, and integration. Treat public SOTA as a prior, then run a private benchmark.

Q04Which OCR is best for invoices and receipts?+

For invoices and receipts, start with the best document-parsing or PDF-to-Markdown systems, then score field F1 on your schema. OmniDocBench is relevant for layout, tables, and formulas, but it does not prove production KIE reliability for your vendors, tax IDs, currencies, or Polish diacritics.

Q05How much does OCR cost per page?+

OCR cost per 1,000 pages depends on instance price per hour, pages per hour, utilization, storage, orchestration, retry rate, and human review rate. Vendor APIs are simpler to buy; self-hosted open-weight models can be cheaper only when GPU utilization and operations are under control.

§ 12
Methodology

Where these numbers come from.

All benchmark results on this page are sourced from AlphaXiv leaderboards, published papers, and our own independent verification. Each data point in benchmarks.json carries a source URL and an access date; every ranking you see is recomputed on build from that file.

Results marked “pending verification” are claims that we have not independently confirmed. We do not include estimated or interpolated values.

Don’t want to pick a model? Drop a PDF at hardparse.com and get clean Markdown back — tables, formulas and layout preserved. It is our sister project running one of the top-ranked models on this page. Rankings stay independent of it.

Read next

Three places to go from here.

Condensed view
OCR Power Ranking
One ranking by average percentile across all OCR benchmarks, plus CodeSOTA-verified scores where we ran our own eval.
Practical guide
Best OCR for handwriting
Frontier VLMs (GPT-5, Claude Opus 4.7, Gemini 3) on IAM. CER, bounding-box support, code samples.
Comparison
PaddleOCR vs Tesseract vs dots.ocr
Three-way benchmark: throughput, edit distance, $/1K pages. When each OCR engine wins.