Agentic · Model mix·Snapshot 2026-04-14

Hermes Agent: what 1.96T tokens actually cost

Nous Research's Hermes Agent is an open-source self-improving agent that routes every call through OpenRouter. The routing is visible — which means we can see exactly which models carry the load, and price the bill. Here's the 30-day mix, ranked two ways: total spend and blended cost per million tokens.

1.96T

Total tokens (30d)

$3.15M

Monthly cost

Global OpenRouter rank

255

Distinct models used

The expensive thing isn't the cheap thing

Claude Opus 4.6 is Hermes Agent's single largest line item at $1.12M/month — 35.7% of known spend — despite being 4th by token volume. A 106B-token slice of premium-priced Opus beats a 510B-token flood of Qwen on raw dollars.
The two Qwen3.6 Plus lanes (main + preview) combined move 561.9B tokens — the biggest single-family workload — but cost ~$438K, still less than half of Opus 4.6's bill.
Cheapest non-free workhorse: MiMo-V2-Flash at $0.15/M blended. Step 3.5 Flash is next at $0.16/M. An agent willing to route tool-use rounds to these would cut its per-token spend by ~90% vs Opus 4.6.
Two free models (Nemotron 3 Super, MiniMax M2.5) absorb 127.2B tokens of load at zero cost — which means Hermes's "real" paid token volume is closer to ~1.83T.

Ranked by monthly spend

Estimated cost = tokens × blended $/M, where blended $/M uses a 72/28 input/output split (agentic tool-use profile). Pricing pulled from openrouter.ai/api/v1/models on 2026-04-14.

#	Model	Vendor	Tokens	$/M in	$/M out	Blended $/M	Monthly cost	% of spend
1	Claude Opus 4.6	Anthropic	106B	$5.00	$25.00	$10.60	$1.12M	35.7%
2	MiMo-V2-Pro	Xiaomi	482B	$1.00	$3.00	$1.56	$752K	23.9%
3	Qwen3.6 Plus	Qwen	510B	$0.33	$1.95	$0.78	$398K	12.6%
4	GPT-5.4	OpenAI	45.5B	$2.50	$15.00	$6.00	$273K	8.7%
5	Claude Sonnet 4.6	Anthropic	37.5B	$3.00	$15.00	$6.36	$239K	7.6%
6	MiniMax M2.7	MiniMax	132B	$0.30	$1.20	$0.55	$73K	2.3%
7	Gemini 3 Flash Preview	Google	52.5B	$0.50	$3.00	$1.20	$63K	2.0%
8	Gemini 3.1 Pro Preview	Google	11.4B	$2.00	$12.00	$4.80	$55K	1.7%
9	Qwen3.6 Plus Preview	Qwen	51.9B	$0.33	$1.95	$0.78	$40K	1.3%
10	GLM 5.1	Z.ai	25.7B	$0.95	$3.15	$1.57	$40K	1.3%
11	GPT-5.4 Mini	OpenAI	13.4B	$0.75	$4.50	$1.80	$24K	0.8%
12	Kimi K2.5	Moonshot AI	30.4B	$0.38	$1.72	$0.76	$23K	0.7%
13	Gemini 2.5 Flash	Google	15.5B	$0.30	$2.50	$0.92	$14K	0.5%
14	DeepSeek V3.2	DeepSeek	20.5B	$0.40	$1.20	$0.62	$13K	0.4%
15	Step 3.5 Flash	StepFun	71.2B	$0.10	$0.30	$0.16	$11K	0.4%
16	Trinity Large Preview	Arcee AI	11.9B	$0.22	$0.85	$0.40	$5K	0.1%
17	MiMo-V2-Flash	Xiaomi	12.4B	$0.09	$0.29	$0.15	$2K	0.1%
18	Nemotron 3 SuperFREE	NVIDIA	102B	$0.00	$0.00	$0.00	$0	0.0%
19	MiniMax M2.5FREE	MiniMax	25.2B	$0.00	$0.00	$0.00	$0	0.0%
20	Hunter Alpha	OpenRouter	23.9B	—	—	—	—	—
Total (known-price rows)			1.76T				$3.15M	100%

Ranked by blended cost per million tokens

Pure price efficiency — free models excluded. Lower is cheaper. This is the lens an agent router should use if it's delegating cheap reasoning steps.

#	Model	Vendor	Blended $/M	Tokens	vs Opus 4.6
1	MiMo-V2-Flash	Xiaomi	$0.15	12.4B	1.4%
2	Step 3.5 Flash	StepFun	$0.16	71.2B	1.5%
3	Trinity Large Preview	Arcee AI	$0.40	11.9B	3.7%
4	MiniMax M2.7	MiniMax	$0.55	132B	5.2%
5	DeepSeek V3.2	DeepSeek	$0.62	20.5B	5.9%
6	Kimi K2.5	Moonshot AI	$0.76	30.4B	7.1%
7	Qwen3.6 Plus	Qwen	$0.78	510B	7.4%
8	Qwen3.6 Plus Preview	Qwen	$0.78	51.9B	7.4%
9	Gemini 2.5 Flash	Google	$0.92	15.5B	8.6%
10	Gemini 3 Flash Preview	Google	$1.20	52.5B	11.3%
11	MiMo-V2-Pro	Xiaomi	$1.56	482B	14.7%
12	GLM 5.1	Z.ai	$1.57	25.7B	14.8%
13	GPT-5.4 Mini	OpenAI	$1.80	13.4B	17.0%
14	Gemini 3.1 Pro Preview	Google	$4.80	11.4B	45.3%
15	GPT-5.4	OpenAI	$6.00	45.5B	56.6%
16	Claude Sonnet 4.6	Anthropic	$6.36	37.5B	60.0%
17	Claude Opus 4.6	Anthropic	$10.60	106B	100.0%

Methodology & caveats

Token counts are the 30-day figures reported on openrouter.ai/apps/hermes-agent (top-20 slice, 1.96T total across 255 models).
Pricing comes from the public /api/v1/models endpoint on 2026-04-14. Where OpenRouter returns multiple variants, we pick the default (non-preview, non-fast).
72/28 input/output split is a modeling assumption — OpenRouter exposes total tokens but not the direction. Agentic workloads tend to be context-heavy because every tool round replays the conversation. Flipping to 50/50 raises the total bill by ~15%; flipping to 60/40 raises it ~8%.
Hunter Alpha is an unpriced alpha model — no cost row. Once it graduates to a priced slot, the total will shift.
Preview vs stable: Qwen3.6 Plus Preview is priced off the stable Qwen3.6 Plus lane since OpenRouter doesn't surface the preview separately. Trinity Large Preview is listed as free on OpenRouter; we price it at the non-free Trinity Large Thinking rate to keep the comparison apples-to-apples.
Not a quality ranking. Cost-per-token says nothing about whether a model actually solves the task. See our agentic benchmarks for SWE-bench, BinaryAudit, OTelBench, and YC-Bench scores on most of these models.

Running Hermes locally?

This page explains the cloud model bill. For local OpenClaw or Hermes installs, use the OpenClaw / Hermes hardware guide to pick between 27B-no-think, 27B-thinking, and Coder-Next by GPU tier and workload.

We reply within 48 hours

Know Hermes Agent's real cost structure?

If our 72/28 input/output split assumption is off, or you see a model we missed, tell us. We'll re-run the analysis.

Tell us what you found →

✓ No newsletter✓ Real humans read this✓ 30 seconds to send