Hardparse runs PaddleOCR-VL-1.5 — the highest-scoring open-source OCR model — as a hosted web app. Drag in a PDF or scan, get clean Markdown back. Tables become real tables, formulas become LaTeX, handwriting just works.
The economics of document parsing changed in late 2025. Here's what that means for you.
AWS Textract costs $65,000/mo for 1M pages. Hardparse's Pro tier is $19/mo for unlimited pages — the same SOTA model, a tiny fraction of the price.
PaddleOCR-VL-1.5 scores 92.86 on OmniDocBench — higher than GPT-5, Gemini 2.5 Pro, AWS Textract, and Google Document AI on document parsing.
You could run PaddleOCR-VL yourself — or upload a PDF and get Markdown back in seconds. Hardparse handles the infra so you can ship.
| Feature | Hardparse | AWS Textract | Google Doc AI | GPT-5 |
|---|---|---|---|---|
| OmniDocBench Score | 92.86 | ~85 | ~87 | ~88 |
| Table extraction | Native (Markdown) | Add-on ($$$) | Add-on | Prompt-based |
| Math / LaTeX | Yes | No | No | Partial |
| Handwriting | Yes | Yes | Yes | Yes |
| Languages | 109 | ~20 | ~60 | 90+ |
| Free tier | 5 pages/mo | None | 1K pages/mo | None |
| Pricing (1M pages) | $19/mo | $65K/mo | $30K/mo | Token-based |
| Setup time | Drag & drop | IAM + SDK | GCP project | API key + prompt |
| Output formats | Markdown, JSON | JSON | JSON | Text |
Accuracy from OmniDocBench v1.5. Commercial API pricing based on 1M pages/month standard tier. See full cost analysis.
Start free. Upgrade when you need volume or API access.
Upload a document, get clean Markdown back. No credit card, no account required to try a single file.
No credit card · No account to start
Unlimited pages, priority processing, and a REST API you can drop into your pipeline.
Every subscription funds CodeSOTA's independent research
In late 2025, PaddleOCR-VL launched with 0.9 billion parameters and scored 92.86 on OmniDocBench — beating GPT-5, Gemini 2.5 Pro, and every commercial API. A model orders of magnitude smaller than frontier LLMs, but better at reading documents. Hardparse is the easiest way to put that model to work.
No. You can upload a document without creating an account — you just enter an email address to receive the parsed results. The free tier gives you 5 pages per month.
PaddleOCR-VL-1.5 is the top-scoring model on OmniDocBench table extraction. It outputs real Markdown tables, handles nested tables, merged cells, and borderless tables that break AWS Textract.
PDF (including multi-page), PNG, JPG, TIFF, and HEIC. Output is Markdown or JSON. On the Pro tier, you can hit the REST API directly from your pipeline.
You absolutely can self-host — the model is open source. Hardparse exists for people who don't want to provision GPUs, package the model, and keep it updated. Upload a file and you're done.
Hardparse is a hosted service, so documents are processed on the server. If you have hard privacy requirements (medical, legal, classified), self-hosting PaddleOCR-VL is the right call. For most teams the hosted app is the pragmatic path.
Not currently — Hardparse is a web app. Everything runs in your browser against a hosted backend, so there's nothing to install and updates are automatic.
The highest-accuracy OCR model, running as a web app. Free to try, $19/mo for unlimited.
Free: 5 pages/month, no credit card · Pro: $19/month, unlimited + API