Hermes Agent: what 1.96T tokens actually cost
Nous Research's Hermes Agent is an open-source self-improving agent that routes every call through OpenRouter. The routing is visible — which means we can see exactly which models carry the load, and price the bill. Here's the 30-day mix, ranked two ways: total spend and blended cost per million tokens.
1.96T
Total tokens (30d)
$3.15M
Monthly cost
#2
Global OpenRouter rank
255
Distinct models used
The expensive thing isn't the cheap thing
- Claude Opus 4.6 is Hermes Agent's single largest line item at $1.12M/month — 35.7% of known spend — despite being 4th by token volume. A 106B-token slice of premium-priced Opus beats a 510B-token flood of Qwen on raw dollars.
- The two Qwen3.6 Plus lanes (main + preview) combined move 561.9B tokens — the biggest single-family workload — but cost ~$438K, still less than half of Opus 4.6's bill.
- Cheapest non-free workhorse: MiMo-V2-Flash at $0.15/M blended. Step 3.5 Flash is next at $0.16/M. An agent willing to route tool-use rounds to these would cut its per-token spend by ~90% vs Opus 4.6.
- Two free models (Nemotron 3 Super, MiniMax M2.5) absorb 127.2B tokens of load at zero cost — which means Hermes's "real" paid token volume is closer to ~1.83T.
Ranked by monthly spend
Estimated cost = tokens × blended $/M, where blended $/M uses a 72/28 input/output split (agentic tool-use profile). Pricing pulled from openrouter.ai/api/v1/models on 2026-04-14.
| # | Model | Vendor | Tokens | $/M in | $/M out | Blended $/M | Monthly cost | % of spend |
|---|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.6 | Anthropic | 106B | $5.00 | $25.00 | $10.60 | $1.12M | 35.7% |
| 2 | MiMo-V2-Pro | Xiaomi | 482B | $1.00 | $3.00 | $1.56 | $752K | 23.9% |
| 3 | Qwen3.6 Plus | Qwen | 510B | $0.33 | $1.95 | $0.78 | $398K | 12.6% |
| 4 | GPT-5.4 | OpenAI | 45.5B | $2.50 | $15.00 | $6.00 | $273K | 8.7% |
| 5 | Claude Sonnet 4.6 | Anthropic | 37.5B | $3.00 | $15.00 | $6.36 | $239K | 7.6% |
| 6 | MiniMax M2.7 | MiniMax | 132B | $0.30 | $1.20 | $0.55 | $73K | 2.3% |
| 7 | Gemini 3 Flash Preview | 52.5B | $0.50 | $3.00 | $1.20 | $63K | 2.0% | |
| 8 | Gemini 3.1 Pro Preview | 11.4B | $2.00 | $12.00 | $4.80 | $55K | 1.7% | |
| 9 | Qwen3.6 Plus Preview | Qwen | 51.9B | $0.33 | $1.95 | $0.78 | $40K | 1.3% |
| 10 | GLM 5.1 | Z.ai | 25.7B | $0.95 | $3.15 | $1.57 | $40K | 1.3% |
| 11 | GPT-5.4 Mini | OpenAI | 13.4B | $0.75 | $4.50 | $1.80 | $24K | 0.8% |
| 12 | Kimi K2.5 | Moonshot AI | 30.4B | $0.38 | $1.72 | $0.76 | $23K | 0.7% |
| 13 | Gemini 2.5 Flash | 15.5B | $0.30 | $2.50 | $0.92 | $14K | 0.5% | |
| 14 | DeepSeek V3.2 | DeepSeek | 20.5B | $0.40 | $1.20 | $0.62 | $13K | 0.4% |
| 15 | Step 3.5 Flash | StepFun | 71.2B | $0.10 | $0.30 | $0.16 | $11K | 0.4% |
| 16 | Trinity Large Preview | Arcee AI | 11.9B | $0.22 | $0.85 | $0.40 | $5K | 0.1% |
| 17 | MiMo-V2-Flash | Xiaomi | 12.4B | $0.09 | $0.29 | $0.15 | $2K | 0.1% |
| 18 | Nemotron 3 SuperFREE | NVIDIA | 102B | $0.00 | $0.00 | $0.00 | $0 | 0.0% |
| 19 | MiniMax M2.5FREE | MiniMax | 25.2B | $0.00 | $0.00 | $0.00 | $0 | 0.0% |
| 20 | Hunter Alpha | OpenRouter | 23.9B | — | — | — | — | — |
| Total (known-price rows) | 1.76T | $3.15M | 100% | |||||
Ranked by blended cost per million tokens
Pure price efficiency — free models excluded. Lower is cheaper. This is the lens an agent router should use if it's delegating cheap reasoning steps.
| # | Model | Vendor | Blended $/M | Tokens | vs Opus 4.6 |
|---|---|---|---|---|---|
| 1 | MiMo-V2-Flash | Xiaomi | $0.15 | 12.4B | 1.4% |
| 2 | Step 3.5 Flash | StepFun | $0.16 | 71.2B | 1.5% |
| 3 | Trinity Large Preview | Arcee AI | $0.40 | 11.9B | 3.7% |
| 4 | MiniMax M2.7 | MiniMax | $0.55 | 132B | 5.2% |
| 5 | DeepSeek V3.2 | DeepSeek | $0.62 | 20.5B | 5.9% |
| 6 | Kimi K2.5 | Moonshot AI | $0.76 | 30.4B | 7.1% |
| 7 | Qwen3.6 Plus | Qwen | $0.78 | 510B | 7.4% |
| 8 | Qwen3.6 Plus Preview | Qwen | $0.78 | 51.9B | 7.4% |
| 9 | Gemini 2.5 Flash | $0.92 | 15.5B | 8.6% | |
| 10 | Gemini 3 Flash Preview | $1.20 | 52.5B | 11.3% | |
| 11 | MiMo-V2-Pro | Xiaomi | $1.56 | 482B | 14.7% |
| 12 | GLM 5.1 | Z.ai | $1.57 | 25.7B | 14.8% |
| 13 | GPT-5.4 Mini | OpenAI | $1.80 | 13.4B | 17.0% |
| 14 | Gemini 3.1 Pro Preview | $4.80 | 11.4B | 45.3% | |
| 15 | GPT-5.4 | OpenAI | $6.00 | 45.5B | 56.6% |
| 16 | Claude Sonnet 4.6 | Anthropic | $6.36 | 37.5B | 60.0% |
| 17 | Claude Opus 4.6 | Anthropic | $10.60 | 106B | 100.0% |
Methodology & caveats
- Token counts are the 30-day figures reported on openrouter.ai/apps/hermes-agent (top-20 slice, 1.96T total across 255 models).
- Pricing comes from the public
/api/v1/modelsendpoint on 2026-04-14. Where OpenRouter returns multiple variants, we pick the default (non-preview, non-fast). - 72/28 input/output split is a modeling assumption — OpenRouter exposes total tokens but not the direction. Agentic workloads tend to be context-heavy because every tool round replays the conversation. Flipping to 50/50 raises the total bill by ~15%; flipping to 60/40 raises it ~8%.
- Hunter Alpha is an unpriced alpha model — no cost row. Once it graduates to a priced slot, the total will shift.
- Preview vs stable: Qwen3.6 Plus Preview is priced off the stable Qwen3.6 Plus lane since OpenRouter doesn't surface the preview separately. Trinity Large Preview is listed as free on OpenRouter; we price it at the non-free Trinity Large Thinking rate to keep the comparison apples-to-apples.
- Not a quality ranking. Cost-per-token says nothing about whether a model actually solves the task. See our agentic benchmarks for SWE-bench, BinaryAudit, OTelBench, and YC-Bench scores on most of these models.
Know Hermes Agent's real cost structure?
If our 72/28 input/output split assumption is off, or you see a model we missed, tell us. We'll re-run the analysis.