Codesota · The Smart RouterOne API · every task · three tiersIssue: April 22, 2026

§ 00 · The pitch

One API. Every task.
Three tiers.

Codesota is the smart router for AI agents and apps. A single unified API across OCR, speech, vision, code, retrieval, and translation — with SOTA, balanced, and cheap tiers picked from the open benchmark registry. Not one vendor's models. Not one modality. Every task, measured, routed.

Try the OCR endpoint →See the registry/v1/ocr is live today

§ 01 · Audience

Why a router, and why now.

Three readers. One product. The same unified API serves the agent builder, the app builder, and the research team — because the registry underneath is the same registry for all of them.

Agent builders

Eight SDKs, one integration.

An agent like OpenClaw, Hermes, or a Claude-Code-class tool needs TTS, STT, OCR, code, retrieval, and vision all at once. Today that means eight SDKs, eight billing portals, eight rate-limit ceilings, and eight model-deprecation cycles. Codesota collapses all of it into a single bearer token and a single contract.

App builders

You shouldn't read a paper to transcribe a receipt.

Picking the right model per task means reading benchmark papers, reconciling leaderboards, renting GPUs, and stitching APIs. The router reads the registry for you and calls the model that wins on your chosen tier — SOTA, balanced, or cheap — so the product team can ship the feature instead of the infrastructure.

Research teams

What's SOTA is what ships.

When the benchmark registry moves — a new OCR model clears OmniDocBench, a new embedder takes MTEB — the router picks it up on the next call. No drift between the number on the leaderboard and the model in production. The assay is the contract.

§ 02 · Contract

The three-tier grammar.

One request parameter, three possible answers. The tier grammar is the same for every endpoint — /v1/ocr, /v1/tts, /v1/stt — because the benchmark underneath is the arbiter, not a vendor's marketing page.

Param: {"tier": "sota" | "balanced" | "cheap"}
Default: balanced
Arbiter: the registry at /tasks

Tier spec

Applies to every /v1/<task> endpoint

Tier	What it picks	Criterion	Cost profile	Example (OCR)
`sota`	Frontier model for the task	highest score on the canonical benchmark	highest $/call	GPT-5.4 · Gemini 2.5 Pro · Claude Opus 4.6
`balanced`	Best quality-per-dollar	cost-adjusted score across the benchmark	order of magnitude cheaper than SOTA	open model on GPU cloud · hosted by us
`cheap`	OSS self-hosted, amortised	floor set by benchmark score, not vibes	lowest $/call, latency varies	PaddleOCR-VL-1.5 · dots.ocr · MonkeyOCR-pro

Adjacent to, not the same as

Other platforms route — but not across modalities, not across vendors, and not on benchmark-derived tiers. A brief map:

OpenRouter routes within the LLM modality — one contract for text generation, many vendors. Codesota routes across modalities (OCR, TTS, STT, vision, code) under one contract.
Replicate and Together host models across modalities, but you still pick the model. Codesota picks per call, per tier, per benchmark.
Hyperscalers (AWS, Azure, GCP) offer task APIs, but each is locked to the vendor's own models. Codesota routes across vendors and across closed/open lines.

Fig 2 · The three tiers are a public contract, not a pricing page. The benchmark registry at /tasks is the arbiter for every pick. Nothing is model-hardcoded.

§ 03 · Proof

One endpoint, already shipping.

We turned the thesis into a single endpoint. POST /v1/ocr is live today at hardparse.com. On OmniDocBench, the open-source winner is measurably better and two orders of magnitude cheaper than the closed frontier.

Endpoint: POST /v1/ocr
Hosted at: hardparse.com
Benchmark: OmniDocBench · Mar 2026
Full table: /ocr

OmniDocBench · top-6 · price per 1K pages

.csv.json

#	Model	Org	Kind	Score	$/1K	Cost basis
01	PaddleOCR-VL-1.5	Baidu	open	94.50	$0.09	self-hosted, amortised
02	dots.ocr 3B	Rednote	open	88.41	$0.04	self-hosted, amortised
03	MonkeyOCR-pro	OSS	open	86.96	$0.03	self-hosted, amortised
04	GPT-5.4	OpenAI	closed API	85.80	$15.00	vendor API · retail
05	Gemini 2.5 Pro	Google	closed API	84.20	$12.50	vendor API · retail
06	Mistral OCR 3	Mistral	closed API	83.40	$1.00	vendor API · retail

Footnote · how the 167× is computed

PaddleOCR-VL-1.5 self-hosted cost of $0.09/1K pages is the amortised inference cost on a single A100 at a typical utilisation — COGS, not a retail price. GPT-5.4's $15/1K is OpenAI's published list price. So this is a COGS-vs-retail comparison, which flatters the delta. For a straight retail-vs-retail comparison every row shows the actual hosted price you can buy today on /ocr.

Fig 3 · OmniDocBench composite score combines text accuracy, layout understanding, and table extraction. Shaded row is the tier-1 winner the router picks for tier: "sota" on OCR today.

§ 04

Manifesto

Intelligence as a commodity.

Oil has grades. Electricity has tariffs. Shipping has class codes. Every mature market commoditises by standardising the contract, not the molecule. Intelligence is next — and the contract is the thing worth building.

OpenAI, Anthropic, Google are refineries. They ship something extraordinary, but a refinery's output is only useful once the market around it standardises how you buy, price, and substitute it. On OCR today, we can already quote three interchangeable grades against the same assay. The rest of the tasks follow the same shape.

Grade

Brent vs WTI

sota / balanced / cheap

Contract

barrel specification

POST /v1/<task>

Quality cert

assay report

CodeSOTA benchmark

Spot price

$ per barrel

$ per 1K calls

Codesota is building the assay and the contract. The benchmark registry is the assay report. The task endpoint is the grade. Everything else is implementation detail.

§ 05 · Roadmap

Tasks in flight.

What has shipped, what's in design partnership, what's ranked and waiting its turn. We publish this table as-is so readers can see the gap between the registry and the router — and help close it.

Live means the endpoint accepts production traffic today. In design means we are co-building with one or more partner teams. Ranked means the benchmark exists in the registry but no endpoint is served yet.

Endpoint status · Apr 2026

Sorted: live → design → ranked

Task	Modality	Registry coverage	Endpoint	Status
Document OCR	Vision	OmniDocBench · Mar 2026	`/v1/ocr`	Live
Text-to-Speech	Speech	ranked	`/v1/tts`	In design
Speech-to-Text (ASR)	Speech	ranked	`/v1/stt`	In design
Visual Question Answering	Multimodal	ranked	`/v1/vqa`	In design
Translation	Text	ranked	`/v1/translate`	In design
Everything else	Mixed	ranked · not served	`—`	Ranked

Fig 5 · The "everything else" row stands in for the long tail of benchmarks the registry tracks but the router does not yet serve. Full index: /tasks.

§ 06 · Designed for

Agents we're designed for.

Public products whose shape matches the router. Listed as design targets, not customer logos. If you're building one of these, the rest of this page is for you.

Design target

OpenClaw

Open-source desktop agent that takes real actions — files, browsing, shell. Replaces eight vendor SDKs with one router contract.

Needs: OCR · code · retrieval · STT

Design target

Hermes

Nous Research's Hermes Agent routes through 20+ models on OpenRouter today. Codesota folds the cross-modality work (OCR, speech, vision) into the same tier contract.

Needs: multimodal I/O · cost-tier control

Design target

Claude Code

Coding agent class that benefits from cheap-tier retrieval and embedded OCR when the user drops a screenshot or PDF. One bearer token for the non-LLM work.

Needs: OCR · retrieval · code

Design target

General coding agents

RPA tools, DevOps agents, QA bots. All of them need the same non-LLM modalities behind a contract they can depend on for two years.

Needs: TTS · STT · OCR · retrieval

§ 07 · Design partners

Register as a design partner.

Three doors, one building. Pick the one that matches what you're shipping this quarter.

Codesota is independent, open, and built in Warsaw. The benchmark registry is free to read forever. The router is free to try on OCR. Paid tiers come after design partnership — not before.

Door 01

I want the OCR endpoint today

Production-ready. Bearer token, one POST, tiered response. Start on hardparse.com.

Open hardparse.com →

Door 02

I want my task next

Propose the benchmark, the dataset, and the endpoint shape. We co-build in design partnership.

Submit a task proposal →

Door 03

I'm building an agent

One API, every non-LLM modality, three tiers — drop in as a design partner before we price the general tier.

Contact the team →

Three places to go from here.

Proof

Traction

Six months of monthly visitors with no paid acquisition. The chart is the pitch.

Top hub

LLM benchmarks

The biggest section of the site — every frontier LLM benchmark, scored.

Reference

Methodology

How scores are sourced, what counts as primary, and why we exclude what we exclude.

One API. Every task.Three tiers.

Why a router, and why now.

Eight SDKs, one integration.

You shouldn't read a paper to transcribe a receipt.

What's SOTA is what ships.

The three-tier grammar.

One endpoint, already shipping.

Intelligence as a commodity.

Tasks in flight.

Agents we're designed for.

OpenClaw

Hermes

Claude Code

General coding agents

Register as a design partner.

I want the OCR endpoint today

I want my task next

I'm building an agent

Three places to go from here.

One API. Every task.
Three tiers.