Agentic · Market structure·Snapshot 2026-04-27

The AI inference market isn't choosing a winner. It's stratifying into three lanes.

Benchmark leaders don't lose revenue — they lose the long tail of workloads they used to own. When you look at what AI apps actually route through OpenRouter and match it to live pricing, a clear three-tier structure appears: SOTA models absorbing the dollar spend, cost-effective mid-tier handling daily workflows, and commodity models batching the scale workloads at pennies per million tokens. Here's how the market actually splits, with data.

SOTA · Premium

≥ $5/M blended

Models16

Monthly cost$54.38M

Tokens6.84T

% of spend

72%

% of tokens

21%

Cost-effective

$0.50–$5/M blended

Models43

Monthly cost$18.89M

Tokens14.79T

% of spend

25%

% of tokens

46%

Commodity · Scale

< $0.50/M blended

Models40

Monthly cost$2.74M

Tokens10.33T

% of spend

% of tokens

32%

The upside-down insight: SOTA models are 72% of spend but only 21% of tokens. Commodity models are 32% of tokens but only 4% of spend. The market pays premium prices for a narrow slice of quality-critical work and routes everything else to whoever's cheapest.

The data behind the tiers

This isn't an editorial choice. It's a distribution.

If you plot every priced model on OpenRouter by its blended price, the distribution is bimodal — one dense peak in the commodity band, another in the cost-effective band, and then a sparse premium tail above $5/M. The same shape told two ways below: first by how many dollars flow through each price band (the economic shape), then by how many models inhabit it.

Dollar spend by price band

Where the money actually flows · log-scale buckets

$5K

$394

$18

$70

$6K

$863K

$84K

$419K

$1.36M

$1.09M

$3.19M

$1.53M

$7.68M

$4.55M

$181K

$2.13M

$24.49M

$28.38M

$35K

$4K

$20K

$0.50 ↓

$5 ↓

$0.05$0.10$0.50$1$5$10$25

The real shape of the market. The SOTA tail above $5/M is narrow in bins but tall in dollars — premium pricing compounds small token volumes into huge spend. Meanwhile commodity models sit near zero by dollars even where they host most of the token volume. Same models, opposite shape depending on the metric.

Model count by price band

How many priced models live at each price · 99 total

$0.50 ↓

$5 ↓

$0.05$0.10$0.50$1$5$10$25

CommodityCost-effectiveSOTA

Bimodal distribution. Commodity band holds 47 models, cost-effective band holds 39, SOTA is a sparse tail of 13 models. Two density peaks plus a long tail — the shape is closer to two clouds than three clusters. The premium tier is few models that matter enormously on the dollar chart above and not at all here.

Price vs monthly token volume

Each dot is a model · log-log · sized by monthly cost

What to see: the cloud tilts negative — higher price, lower token volume. SOTA-tier dots sit top-right by cost (big circles) but anchor low on the y-axis. The commodity tier (left) carries the highest volumes. This is the anti-correlation: the market routes most tokens to whichever model is cheapest.

Concrete comparison: 1B tokens/month reference workload

Using the average prices of the models within each tier (not cherry-picked extremes). 72/28 input/output split. Scale this by your actual volume to see what tier choice costs you.

Tier 1 · SOTA

$18K/mo

for 1B tokens/month

avg $/M in$8.11

avg $/M out$43.88

avg blended$18.12

annual$217K

Tier 2 · Cost-effective

$2K/mo

for 1B tokens/month

avg $/M in$0.85

avg $/M out$4.11

avg blended$1.76

annual$21K

Tier 3 · Commodity

$230/mo

for 1B tokens/month

avg $/M in$0.13

avg $/M out$0.48

avg blended$0.23

annual$3K

What to see: the same 1B tokens that cost $18K/month on SOTA runs for $230/month on commodity — a 79× difference. At 10B tokens/month the gap is $179K/month — the entire economics of whether a company is profitable or not lives in this decision.

Tier 1 · SOTA

The premium lane — use when correctness is the constraint

These are the models you reach for when a wrong answer is more expensive than an extra dollar of compute. Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro. They top the agentic benchmarks (BinaryAudit, SWE-bench, OTelBench), they're slower, they're priced at $3–$30 input / $15–$150 output. In exchange, they do the thing you actually needed done.

Use when

✓Financial decisions, legal review, compliance flags
✓Critical-path reasoning in agentic workflows where each wrong step compounds
✓Final-answer generation after cheaper models have pre-filtered
✓Customer-facing output where quality is visible

Don't use for

✗Batch processing millions of documents
✗Bulk classification, OCR post-processing, or log parsing
✗Tasks where a 2× quality bump isn't worth 20× cost

#	Model	$/M in	$/M out	Blended	Tokens	Monthly cost	Apps
1	Anthropic: Claude Opus 4.6anthropic	$5.00	$25.00	$10.60	2.13T	$22.58M	22
2	Anthropic: Claude Sonnet 4.6anthropic	$3.00	$15.00	$6.36	3.09T	$19.65M	23
3	Anthropic: Claude Opus 4.7anthropic	$5.00	$25.00	$10.60	510.0B	$5.41M	13
4	OpenAI: GPT-5.4openai	$2.50	$15.00	$6.00	520.4B	$3.12M	15
5	Anthropic: Claude Sonnet 4.5anthropic	$3.00	$15.00	$6.36	267.7B	$1.70M	15
6	OpenAI: GPT-5.3-Codexopenai	$1.75	$14.00	$5.18	276.8B	$1.43M	9
7	Anthropic: Claude Opus 4.5anthropic	$5.00	$25.00	$10.60	37.5B	$397K	6
8	OpenAI: GPT-5.5openai	$5.00	$30.00	$12.00	2.9B	$35K	3
9	Anthropic: Claude Opus 4.6 (Fast)anthropic	$30.00	$150.00	$63.60	188.5M	$12K	1
10	OpenAI: GPT-5.2openai	$1.75	$14.00	$5.18	2.3B	$12K	3
+ 6 more in this tier

Tier 2 · Cost-effective

The workhorse lane — where most real work actually runs

The sweet spot between quality and cost. Gemini 3 Flash, Qwen3.6 Plus, MiMo-V2-Pro, MiniMax M2.7, DeepSeek V3.2, GPT-5.4 Mini. Priced at $0.30–$2 input / $1–$5 output, often delivering 80–90% of SOTA quality on most tasks at 5–20% of the cost. This is the tier the OpenRouter data shows is winning on tokens — the vendors gaining share are all here.

Use when

✓Production AI features where cost sensitivity matters
✓Daily agentic work — code assistants, research, drafting
✓Tool-use and function-calling at reasonable volume
✓Anywhere SOTA is overkill but commodity is too weak

Don't use for

✗Adversarial reasoning or high-stakes decisions
✗Tasks where you've verified a specific SOTA model has a meaningful quality lead
✗Extreme scale batch jobs where commodity pricing wins

#	Model	$/M in	$/M out	Blended	Tokens	Monthly cost	Apps
1	Xiaomi: MiMo-V2-Proxiaomi	$1.00	$3.00	$1.56	4.92T	$7.68M	15
2	Qwen: Qwen3.6 Plusqwen	$0.33	$1.95	$0.78	3.25T	$2.53M	28
3	MiniMax: MiniMax M2.7minimax	$0.30	$1.20	$0.55	1.91T	$1.05M	17
4	Z.ai: GLM 5 Turboz-ai	$1.20	$4.00	$1.98	1.56T	$3.09M	3
5	Google: Gemini 3 Flash Previewgoogle	$0.50	$3.00	$1.20	891.1B	$1.07M	20
6	MoonshotAI: Kimi K2.5moonshotai	$0.44	$2.00	$0.88	413.8B	$363K	14
7	Google: Gemini 2.5 Flashgoogle	$0.30	$2.50	$0.92	318.9B	$292K	9
8	Anthropic: Claude Haiku 4.5anthropic	$1.00	$5.00	$2.12	306.0B	$649K	9
9	Xiaomi: MiMo-V2-Omnixiaomi	$0.40	$2.00	$0.85	257.2B	$218K	2
10	Z.ai: GLM 5.1z-ai	$1.05	$3.50	$1.74	255.9B	$444K	17
+ 33 more in this tier

Tier 3 · Commodity scale

The scale lane — for workloads measured in billions of tokens

Priced at < $0.50/M blended. MiMo-V2-Flash, Step 3.5 Flash, Trinity Large, free tiers of larger open-weight models. These are the tools when the question is no longer "is this the best?" but "can we afford to run this against the whole corpus?". Quality varies wildly — some of these match mid-tier on narrow tasks, some are only useful for trivial classification.

Use when

✓Batch processing at scale — billions of tokens per month
✓Pre-filtering, triage, and cheap first-pass classification
✓Post-processing and format conversion (where rules + small LLM beat a big LLM)
✓Offline enrichment where latency doesn't matter

Don't use for

✗Agentic decisions that affect downstream state
✗Customer-facing text generation
✗Anywhere you haven't verified the specific model actually works on your task

#	Model	$/M in	$/M out	Blended	Tokens	Monthly cost	Apps
1	MiniMax: MiniMax M2.5minimax	$0.15	$1.15	$0.43	3.02T	$1.30M	12
2	StepFun: Step 3.5 Flashstepfun	$0.10	$0.30	$0.16	2.73T	$426K	15
3	NVIDIA: Nemotron 3 Supernvidia	$0.09	$0.45	$0.19	1.43T	$274K	12
4	DeepSeek: DeepSeek V3.2deepseek	$0.25	$0.38	$0.29	1.23T	$355K	20
5	Google: Gemini 2.5 Flash Litegoogle	$0.10	$0.40	$0.18	643.7B	$118K	6
6	Arcee AI: Trinity Large Previewarcee-ai	$0.15	$0.45	$0.23	356.9B	$84K	6
7	Xiaomi: MiMo-V2-Flashxiaomi	$0.09	$0.29	$0.15	214.2B	$31K	4
8	Mistral: Mistral Nemomistralai	$0.02	$0.04	$0.03	184.6B	$5K	1
9	DeepSeek: DeepSeek V3 0324deepseek	$0.20	$0.77	$0.36	86.8B	$31K	3
10	DeepSeek: DeepSeek V4 Flashdeepseek	$0.14	$0.28	$0.18	73.5B	$13K	4
+ 30 more in this tier

A decision framework in four questions

The mistake most teams make is picking a single model for every workload. The market's behavior tells you to route by tier instead.

1
Is a wrong answer expensive?
If a single bad call costs more than $10 in downstream impact (disputed invoice, broken migration, misrouted support ticket), stay in Tier 1 SOTA for that step. The pricing premium is insurance.
2
Can you split the workload?
Most agentic workflows have a critical step buried in ten ancillary ones. Route only the critical step to SOTA; run the rest on Tier 2 cost-effective. This is how the apps winning on unit economics are set up.
3
What's your token volume?
Under 100M tokens/month: tier choice matters less, pick on quality. Over 1B tokens/month: you cannot afford a blanket Tier 1 — the bill will eat your margin. This is where Tier 3 commodity for bulk paths becomes non-negotiable.
4
Is this task quality-verified on that model?
A $0.10/M model that hallucinates 30% of the time on your task is more expensive than a $10/M model that doesn't. Price is only half the equation. Verify against a held-out set before committing any tier — especially commodity.

Why this analysis exists

Every benchmark site ranks models by a single number. Every routing service picks for you. Neither of those helps when the real answer is "use three models, one for each lane, and know where each boundary sits".

CodeSOTA joins benchmark performance, live pricing, and real OpenRouter usage into one view — so you can pick not just which model but which tier for which step. That's the decision-engine layer on top of the raw catalogs.

We reply within 48 hours

Disagree with our tier boundaries?

If you think the $5/M and $0.50/M cutoffs are wrong — or you've run a specific model head-to-head against one we'd put a tier higher — tell us. We reply within 48 hours and update the analysis.

Tell us what you found →

✓ No newsletter✓ Real humans read this✓ 30 seconds to send