Codesota · OCR · HardparseHome/OCR/Hardparse

#1 on OmniDocBench · 92.86 composite

SOTA document OCR, in your browser.

Hardparse runs PaddleOCR-VL-1.5 — the highest-scoring open-source OCR model — as a hosted web app. Drag in a PDF or scan, get clean Markdown back. Tables become real tables, formulas become LaTeX, handwriting just works.

Try hardparse.com →Read the cost analysis

Free tier: 5 pages/mo·No signup to try·API on Pro tier·109 languages

§ 01 · Why teams switch

Why teams are switching from cloud OCR APIs.

The economics of document parsing changed in late 2025. Here's what that means for you.

$19

vs $65K from Textract

AWS Textract costs $65,000/mo for 1M pages. Hardparse's Pro tier is $19/mo for unlimited pages — the same SOTA model, a tiny fraction of the price.

92.86

Beats every commercial API

PaddleOCR-VL-1.5 scores 92.86 on OmniDocBench — higher than GPT-5, Gemini 2.5 Pro, AWS Textract, and Google Document AI on document parsing.

0 setup

No GPUs to manage

You could run PaddleOCR-VL yourself — or upload a PDF and get Markdown back in seconds. Hardparse handles the infra so you can ship.

§ 02 · Comparison

How Hardparse compares.

Feature	Hardparse	AWS Textract	Google Doc AI	GPT-5
OmniDocBench Score	92.86	~85	~87	~88
Table extraction	Native (Markdown)	Add-on ($$$)	Add-on	Prompt-based
Math / LaTeX	Yes	No	No	Partial
Handwriting	Yes	Yes	Yes	Yes
Languages	109	~20	~60	90+
Free tier	5 pages/mo	None	1K pages/mo	None
Pricing (1M pages)	$19/mo	$65K/mo	$30K/mo	Token-based
Setup time	Drag & drop	IAM + SDK	GCP project	API key + prompt
Output formats	Markdown, JSON	JSON	JSON	Text

Accuracy from OmniDocBench v1.5. Commercial API pricing based on 1M pages/month standard tier. See full cost analysis.

§ 03 · Pricing

Two plans, one SOTA engine.

Start free. Upgrade when you need volume or API access.

For trying it out

Free

Upload a document, get clean Markdown back. No credit card, no account required to try a single file.

✓5 pages per month
✓Full SOTA model (PaddleOCR-VL-1.5)
✓PDFs, images, multi-page documents
✓Tables, formulas, handwriting
✓Markdown and JSON output
✓Email delivery of results

$0forever

Try it free →

No credit card · No account to start

For production use

Pro

Unlimited pages, priority processing, and a REST API you can drop into your pipeline.

✓Unlimited pages
✓Priority processing queue
✓REST API access
✓Email support
✓Everything in Free
✓Supports CodeSOTA research

$19/ month

Upgrade to Pro →

Every subscription funds CodeSOTA's independent research

§ 04 · Use cases

What people parse.

Invoices & Receipts

Extract line items, totals, tax info into structured data

Academic Papers

Tables, citations, equations rendered as LaTeX

Bank Statements

Transaction tables parsed into rows and columns

Contracts & Legal

Clauses, signatures, handwritten notes

Medical Records

Forms, lab results, handwritten notes

Engineering Drawings

Annotations, dimensions, part numbers extracted

Multilingual Docs

109 languages including CJK, Arabic, Devanagari

Scans & Screenshots

Any image, any resolution, structured text out

§ 05 · Pro API

Three lines to parse a document.

terminal

curl -X POST https://api.hardparse.com/v1/parse \
  -H "Authorization: Bearer hp_your_key" \
  -F "file=@invoice.pdf"

# Response:
{
  "regions": [
    { "type": "table", "confidence": 0.97, "markdown": "| Item | Qty | Price |\n|---|---|---|\n| Widget | 100 | $5.00 |" },
    { "type": "text", "confidence": 0.99, "markdown": "## Invoice #2847\nDate: March 15, 2026" },
    { "type": "handwriting", "confidence": 0.94, "markdown": "Approved - JS" }
  ],
  "processing_time_ms": 1240
}

§ 06 · The story

The benchmark story behind Hardparse.

In late 2025, PaddleOCR-VL launched with 0.9 billion parameters and scored 92.86 on OmniDocBench — beating GPT-5, Gemini 2.5 Pro, and every commercial API. A model orders of magnitude smaller than frontier LLMs, but better at reading documents. Hardparse is the easiest way to put that model to work.

Read the full analysis →See all benchmark results →Compare OCR models →

§ 07 · FAQ

Frequently asked.

Do I need to sign up to try Hardparse?+

No. You can upload a document without creating an account — you just enter an email address to receive the parsed results. The free tier gives you 5 pages per month.

How accurate is it on tables?+

PaddleOCR-VL-1.5 is the top-scoring model on OmniDocBench table extraction. It outputs real Markdown tables, handles nested tables, merged cells, and borderless tables that break AWS Textract.

What file types are supported?+

PDF (including multi-page), PNG, JPG, TIFF, and HEIC. Output is Markdown or JSON. On the Pro tier, you can hit the REST API directly from your pipeline.

How does this compare to running PaddleOCR-VL myself?+

You absolutely can self-host — the model is open source. Hardparse exists for people who don't want to provision GPUs, package the model, and keep it updated. Upload a file and you're done.

What about data privacy?+

Hardparse is a hosted service, so documents are processed on the server. If you have hard privacy requirements (medical, legal, classified), self-hosting PaddleOCR-VL is the right call. For most teams the hosted app is the pragmatic path.

Is there a Mac or desktop app?+

Not currently — Hardparse is a web app. Everything runs in your browser against a hosted backend, so there's nothing to install and updates are automatic.

Stop paying per page.

The highest-accuracy OCR model, running as a web app. Free to try, $19/mo for unlimited.

Try hardparse.com →Compare OCR models

Free: 5 pages/month, no credit card · Pro: $19/month, unlimited + API

§ 08 · Related