Codesota · OCR · HardparseHome/OCR/Hardparse
#1 on OmniDocBench · 92.86 composite

SOTA document OCR, in your browser.

Hardparse runs PaddleOCR-VL-1.5 — the highest-scoring open-source OCR model — as a hosted web app. Drag in a PDF or scan, get clean Markdown back. Tables become real tables, formulas become LaTeX, handwriting just works.

Try hardparse.com Read the cost analysis
Free tier: 5 pages/mo·No signup to try·API on Pro tier·109 languages
§ 01 · Why teams switch

Why teams are switching from cloud OCR APIs.

The economics of document parsing changed in late 2025. Here's what that means for you.

$19
vs $65K from Textract

AWS Textract costs $65,000/mo for 1M pages. Hardparse's Pro tier is $19/mo for unlimited pages — the same SOTA model, a tiny fraction of the price.

92.86
Beats every commercial API

PaddleOCR-VL-1.5 scores 92.86 on OmniDocBench — higher than GPT-5, Gemini 2.5 Pro, AWS Textract, and Google Document AI on document parsing.

0 setup
No GPUs to manage

You could run PaddleOCR-VL yourself — or upload a PDF and get Markdown back in seconds. Hardparse handles the infra so you can ship.

§ 02 · Comparison

How Hardparse compares.

FeatureHardparseAWS TextractGoogle Doc AIGPT-5
OmniDocBench Score92.86~85~87~88
Table extractionNative (Markdown)Add-on ($$$)Add-onPrompt-based
Math / LaTeXYesNoNoPartial
HandwritingYesYesYesYes
Languages109~20~6090+
Free tier5 pages/moNone1K pages/moNone
Pricing (1M pages)$19/mo$65K/mo$30K/moToken-based
Setup timeDrag & dropIAM + SDKGCP projectAPI key + prompt
Output formatsMarkdown, JSONJSONJSONText

Accuracy from OmniDocBench v1.5. Commercial API pricing based on 1M pages/month standard tier. See full cost analysis.

§ 03 · Pricing

Two plans, one SOTA engine.

Start free. Upgrade when you need volume or API access.

For trying it out

Free

Upload a document, get clean Markdown back. No credit card, no account required to try a single file.

  • 5 pages per month
  • Full SOTA model (PaddleOCR-VL-1.5)
  • PDFs, images, multi-page documents
  • Tables, formulas, handwriting
  • Markdown and JSON output
  • Email delivery of results
$0forever
Try it free

No credit card · No account to start

For production use

Pro

Unlimited pages, priority processing, and a REST API you can drop into your pipeline.

  • Unlimited pages
  • Priority processing queue
  • REST API access
  • Email support
  • Everything in Free
  • Supports CodeSOTA research
$19/ month
Upgrade to Pro

Every subscription funds CodeSOTA's independent research

§ 04 · Use cases

What people parse.

Invoices & Receipts
Extract line items, totals, tax info into structured data
Academic Papers
Tables, citations, equations rendered as LaTeX
Bank Statements
Transaction tables parsed into rows and columns
Contracts & Legal
Clauses, signatures, handwritten notes
Medical Records
Forms, lab results, handwritten notes
Engineering Drawings
Annotations, dimensions, part numbers extracted
Multilingual Docs
109 languages including CJK, Arabic, Devanagari
Scans & Screenshots
Any image, any resolution, structured text out
§ 05 · Pro API

Three lines to parse a document.

terminal
curl -X POST https://api.hardparse.com/v1/parse \
  -H "Authorization: Bearer hp_your_key" \
  -F "file=@invoice.pdf"

# Response:
{
  "regions": [
    { "type": "table", "confidence": 0.97, "markdown": "| Item | Qty | Price |\n|---|---|---|\n| Widget | 100 | $5.00 |" },
    { "type": "text", "confidence": 0.99, "markdown": "## Invoice #2847\nDate: March 15, 2026" },
    { "type": "handwriting", "confidence": 0.94, "markdown": "Approved - JS" }
  ],
  "processing_time_ms": 1240
}
§ 06 · The story

The benchmark story behind Hardparse.

In late 2025, PaddleOCR-VL launched with 0.9 billion parameters and scored 92.86 on OmniDocBench — beating GPT-5, Gemini 2.5 Pro, and every commercial API. A model orders of magnitude smaller than frontier LLMs, but better at reading documents. Hardparse is the easiest way to put that model to work.

Read the full analysis →See all benchmark results →Compare OCR models →
§ 07 · FAQ

Frequently asked.

Do I need to sign up to try Hardparse?+

No. You can upload a document without creating an account — you just enter an email address to receive the parsed results. The free tier gives you 5 pages per month.

How accurate is it on tables?+

PaddleOCR-VL-1.5 is the top-scoring model on OmniDocBench table extraction. It outputs real Markdown tables, handles nested tables, merged cells, and borderless tables that break AWS Textract.

What file types are supported?+

PDF (including multi-page), PNG, JPG, TIFF, and HEIC. Output is Markdown or JSON. On the Pro tier, you can hit the REST API directly from your pipeline.

How does this compare to running PaddleOCR-VL myself?+

You absolutely can self-host — the model is open source. Hardparse exists for people who don't want to provision GPUs, package the model, and keep it updated. Upload a file and you're done.

What about data privacy?+

Hardparse is a hosted service, so documents are processed on the server. If you have hard privacy requirements (medical, legal, classified), self-hosting PaddleOCR-VL is the right call. For most teams the hosted app is the pragmatic path.

Is there a Mac or desktop app?+

Not currently — Hardparse is a web app. Everything runs in your browser against a hosted backend, so there's nothing to install and updates are automatic.

Stop paying per page.

The highest-accuracy OCR model, running as a web app. Free to try, $19/mo for unlimited.

Try hardparse.com Compare OCR models

Free: 5 pages/month, no credit card · Pro: $19/month, unlimited + API

§ 08 · Related

Related OCR content.

Analysis
The Economics Shift
How VLMs collapsed OCR pricing
Decision
OCR Decision Guide
Interactive tool for choosing the right model
Playbook
Ship It Guide
Implementation playbook for production OCR