Agentic · Model mix·Snapshot 2026-04-14

Hermes Agent: what 1.96T tokens actually cost

Nous Research's Hermes Agent is an open-source self-improving agent that routes every call through OpenRouter. The routing is visible — which means we can see exactly which models carry the load, and price the bill. Here's the 30-day mix, ranked two ways: total spend and blended cost per million tokens.

1.96T

Total tokens (30d)

$3.15M

Monthly cost

#2

Global OpenRouter rank

255

Distinct models used

The expensive thing isn't the cheap thing

  • Claude Opus 4.6 is Hermes Agent's single largest line item at $1.12M/month — 35.7% of known spend — despite being 4th by token volume. A 106B-token slice of premium-priced Opus beats a 510B-token flood of Qwen on raw dollars.
  • The two Qwen3.6 Plus lanes (main + preview) combined move 561.9B tokens — the biggest single-family workload — but cost ~$438K, still less than half of Opus 4.6's bill.
  • Cheapest non-free workhorse: MiMo-V2-Flash at $0.15/M blended. Step 3.5 Flash is next at $0.16/M. An agent willing to route tool-use rounds to these would cut its per-token spend by ~90% vs Opus 4.6.
  • Two free models (Nemotron 3 Super, MiniMax M2.5) absorb 127.2B tokens of load at zero cost — which means Hermes's "real" paid token volume is closer to ~1.83T.

Ranked by monthly spend

Estimated cost = tokens × blended $/M, where blended $/M uses a 72/28 input/output split (agentic tool-use profile). Pricing pulled from openrouter.ai/api/v1/models on 2026-04-14.

#ModelVendorTokens$/M in$/M outBlended $/MMonthly cost% of spend
1Claude Opus 4.6Anthropic106B$5.00$25.00$10.60$1.12M35.7%
2MiMo-V2-ProXiaomi482B$1.00$3.00$1.56$752K23.9%
3Qwen3.6 PlusQwen510B$0.33$1.95$0.78$398K12.6%
4GPT-5.4OpenAI45.5B$2.50$15.00$6.00$273K8.7%
5Claude Sonnet 4.6Anthropic37.5B$3.00$15.00$6.36$239K7.6%
6MiniMax M2.7MiniMax132B$0.30$1.20$0.55$73K2.3%
7Gemini 3 Flash PreviewGoogle52.5B$0.50$3.00$1.20$63K2.0%
8Gemini 3.1 Pro PreviewGoogle11.4B$2.00$12.00$4.80$55K1.7%
9Qwen3.6 Plus PreviewQwen51.9B$0.33$1.95$0.78$40K1.3%
10GLM 5.1Z.ai25.7B$0.95$3.15$1.57$40K1.3%
11GPT-5.4 MiniOpenAI13.4B$0.75$4.50$1.80$24K0.8%
12Kimi K2.5Moonshot AI30.4B$0.38$1.72$0.76$23K0.7%
13Gemini 2.5 FlashGoogle15.5B$0.30$2.50$0.92$14K0.5%
14DeepSeek V3.2DeepSeek20.5B$0.40$1.20$0.62$13K0.4%
15Step 3.5 FlashStepFun71.2B$0.10$0.30$0.16$11K0.4%
16Trinity Large PreviewArcee AI11.9B$0.22$0.85$0.40$5K0.1%
17MiMo-V2-FlashXiaomi12.4B$0.09$0.29$0.15$2K0.1%
18Nemotron 3 SuperFREENVIDIA102B$0.00$0.00$0.00$00.0%
19MiniMax M2.5FREEMiniMax25.2B$0.00$0.00$0.00$00.0%
20Hunter AlphaOpenRouter23.9B
Total (known-price rows)1.76T$3.15M100%

Ranked by blended cost per million tokens

Pure price efficiency — free models excluded. Lower is cheaper. This is the lens an agent router should use if it's delegating cheap reasoning steps.

#ModelVendorBlended $/MTokensvs Opus 4.6
1MiMo-V2-FlashXiaomi$0.1512.4B1.4%
2Step 3.5 FlashStepFun$0.1671.2B1.5%
3Trinity Large PreviewArcee AI$0.4011.9B3.7%
4MiniMax M2.7MiniMax$0.55132B5.2%
5DeepSeek V3.2DeepSeek$0.6220.5B5.9%
6Kimi K2.5Moonshot AI$0.7630.4B7.1%
7Qwen3.6 PlusQwen$0.78510B7.4%
8Qwen3.6 Plus PreviewQwen$0.7851.9B7.4%
9Gemini 2.5 FlashGoogle$0.9215.5B8.6%
10Gemini 3 Flash PreviewGoogle$1.2052.5B11.3%
11MiMo-V2-ProXiaomi$1.56482B14.7%
12GLM 5.1Z.ai$1.5725.7B14.8%
13GPT-5.4 MiniOpenAI$1.8013.4B17.0%
14Gemini 3.1 Pro PreviewGoogle$4.8011.4B45.3%
15GPT-5.4OpenAI$6.0045.5B56.6%
16Claude Sonnet 4.6Anthropic$6.3637.5B60.0%
17Claude Opus 4.6Anthropic$10.60106B100.0%

Methodology & caveats

  • Token counts are the 30-day figures reported on openrouter.ai/apps/hermes-agent (top-20 slice, 1.96T total across 255 models).
  • Pricing comes from the public /api/v1/models endpoint on 2026-04-14. Where OpenRouter returns multiple variants, we pick the default (non-preview, non-fast).
  • 72/28 input/output split is a modeling assumption — OpenRouter exposes total tokens but not the direction. Agentic workloads tend to be context-heavy because every tool round replays the conversation. Flipping to 50/50 raises the total bill by ~15%; flipping to 60/40 raises it ~8%.
  • Hunter Alpha is an unpriced alpha model — no cost row. Once it graduates to a priced slot, the total will shift.
  • Preview vs stable: Qwen3.6 Plus Preview is priced off the stable Qwen3.6 Plus lane since OpenRouter doesn't surface the preview separately. Trinity Large Preview is listed as free on OpenRouter; we price it at the non-free Trinity Large Thinking rate to keep the comparison apples-to-apples.
  • Not a quality ranking. Cost-per-token says nothing about whether a model actually solves the task. See our agentic benchmarks for SWE-bench, BinaryAudit, OTelBench, and YC-Bench scores on most of these models.
We reply within 48 hours

Know Hermes Agent's real cost structure?

If our 72/28 input/output split assumption is off, or you see a model we missed, tell us. We'll re-run the analysis.

Tell us what you found →
No newsletter Real humans read this 30 seconds to send