Codesota · Hardware · Vol. IISilicon, priced honestly — every chip, every wattIssue: April 22, 2026
Live registry · consumer · datacenter · edge

Silicon, priced honestly.
Every chip, every watt.

The register of inference and training hardware — RTX 5090, H200, B200, MI300X — with real FP16 TFLOPS, VRAM, memory bandwidth, and the cloud hourly you actually see on Vast.ai, RunPod and AWS. Numbers reproduced against the same quantisation, reported with latency and p95.

Read the leaderboard VRAM by modelPrices verified April 2026 · no affiliate
§ 00¼ · Per-chip pages

One page per accelerator.

Specs, every measured benchmark, VRAM-fit table, and the head-to-heads — for each chip that matters in 2026.

Consumer · NVIDIA

RTX 5090

32 GB · 209.5 FP16 TFLOPS · 575 W

First consumer card with 32 GB. The ceiling for a single-PSU workstation.

Consumer · NVIDIA

RTX 4090

24 GB · 165.2 FP16 TFLOPS · 450 W

Still the workhorse: 24 GB GDDR6X, $0.29/hr on Vast.ai spot.

Consumer · NVIDIA

RTX 3090

24 GB · 35.6 FP16 TFLOPS · 350 W

Best sub-$1k ML card on the used market. 24 GB fits a 70B at INT4.

Datacenter · NVIDIA

H200

141 GB · 989 FP16 TFLOPS · 700 W

141 GB HBM3e. The first datacenter card to fit a 70B in FP16 single-GPU.

Datacenter · NVIDIA

H100 SXM

80 GB · 989 FP16 TFLOPS · 700 W

989 FP16 TFLOPS. The benchmark every other accelerator is measured against.

Datacenter · NVIDIA

B200

192 GB · 4,500 FP16 TFLOPS · 1000 W

4,500 FP16 TFLOPS. The current top of the leaderboard, when you can find one.

Datacenter · AMD

MI300X

192 GB · 2,615 FP16 TFLOPS · 750 W

192 GB HBM3, 2,615 FP16 TFLOPS. AMD’s answer to H200 — and it fits more.

§ 00½ · Head-to-head

X vs Y, same workloads.

Side-by-side spec tables and matched-quantisation throughput numbers. The matchups people actually search for.

/hardware/h200-vs-b200

H200 vs B200

989 vs 4,500 FP16 · 141 GB vs 192 GB

Hopper’s last word against Blackwell’s first. 4.5× the FP16, almost 50% more VRAM bandwidth.

/hardware/h100-vs-h200

H100 SXM vs H200

989 vs 989 FP16 · 80 GB vs 141 GB

Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.

/hardware/rtx-5090-vs-rtx-4090

RTX 5090 vs RTX 4090

209.5 vs 165.2 FP16 · 32 GB vs 24 GB

Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.

/hardware/rtx-5090-vs-h200

RTX 5090 vs H200

209.5 vs 989 FP16 · 32 GB vs 141 GB

The biggest consumer card vs a real datacenter accelerator. When does the 5090 actually catch up?

/hardware/h100-vs-mi300x

H100 SXM vs MI300X

989 vs 2,615 FP16 · 80 GB vs 192 GB

NVIDIA vs AMD at the datacenter. 80 GB HBM3 vs 192 GB HBM3, 989 FP16 TFLOPS vs 2,615.

§ 01 · Leaderboard

Accelerators, ranked.

FP16 TFLOPS is the blunt instrument — useful for ordering, useless for predicting any one workload. Read it next to VRAM, bandwidth, and the hourly price you can actually rent the card for.


Tier
Consumer · Datacenter
Metric
FP16 dense · higher is better
Sources
NVIDIA · AMD datasheets
Updated
April 2026
How we test hardware →
Top-9 · April 2026
.csv.json
#ChipVendorTierVRAMTDPArch trendFP16 TFLOPSCloud $/hr
01B200NVIDIADatacenter192 GB HBM3e1,000 W4,500~$6+/hr
02MI300XAMDDatacenter192 GB HBM3750 W2,615~$3–5/hr
03H200NVIDIADatacenter141 GB HBM3e700 W989~$3.70/hr
04H100 SXMNVIDIADatacenter80 GB HBM3700 W989~$2.50/hr
05A100 80GBNVIDIADatacenter80 GB HBM2e400 W312~$2.00/hr
06RTX 5090NVIDIAConsumer32 GB GDDR7575 W209.5
07RTX 4090NVIDIAConsumer24 GB GDDR6X450 W165.2$0.29–0.59/hr
08RTX 5080NVIDIAConsumer16 GB GDDR7360 W112.6
09RTX 3090NVIDIAConsumer24 GB GDDR6X350 W35.6$0.29–0.34/hr
Fig 2 · Dense FP16 from vendor datasheets. Trend line traces the chip's architectural lineage (Ampere → Ada → Blackwell; CDNA2 → CDNA3). Shaded row marks the top-performing accelerator in the register today.
§ 02 · Workloads

Eleven workloads. One card at a time.

The same card is fast at LLM serving and slow at diffusion — it depends on the bottleneck. Each panel is a real benchmark, each figure at right is the current best number we have reproduced on a consumer RTX 5090 or cheapest-cloud equivalent.

Llama 3.1 8B · tok/s
202326
140consumer · higher ↑
Llama 3.1 70B · 4-bit
202326
38tok/s · higher ↑
Qwen 2.5 32B · 4-bit
202326
48tok/s · higher ↑
Mistral 7B · tok/s
202326
165consumer · higher ↑
SDXL 1024 · it/s
202326
6.5image gen · higher ↑
Flux.1 Dev · it/s
202326
3.4image gen · higher ↑
LoRA FT · samples/s
202326
12.5Llama 8B · higher ↑
ResNet-50 · img/s
202326
2,800ImageNet · higher ↑
YOLOv8x · FPS
202326
320detection · higher ↑
SAM ViT-H · masks/s
202326
9.2segment · higher ↑
Whisper-Lv3 · ×RT
202326
28transcribe · higher ↑
Cheapest 4090 $/hr
202326
$0.29cloud · lower ↓
Fig 3 · Best published throughput over recent generations, per workload. Copper dot marks the current state of the art on a single consumer card (or cheapest cloud hour, for the pricing panel).
§ 03 · Procurement

Cloud, on-prem, or edge.

Three honest lanes for ML hardware in 2026. The right answer is usually two of them — cloud for the bursts, on-prem for the grind; edge for the field.

Rent by the hour

Cloud.

The default for anything that isn't sustained. Vast.ai and RunPod community tiers bring an RTX 4090 to $0.29/hr; an H100 to $1.65–2.99/hr.

  • Zero capex. No racks, no PSU maths, no NVLink wiring.
  • Spot / community = cheap but interruptible — fine for experimentation.
  • Hyperscalers (AWS, GCP) charge 2–3× what aggregators do for the same silicon.
  • Break-even vs owned consumer card: ~3–8 months at 20–40 hrs/wk usage.
Best for: prototyping, bursty training, any H100/H200/B200 need, teams < 5.
Own the card

On-prem.

A used RTX 3090 at $700–900 or a new RTX 5090 at $1,999 pays for itself inside a year at 20 hrs/week, before electricity. An H100 80GB at ~$30k does not — keep those in the cloud.

  • No egress fees. Data never leaves the building.
  • Sustained 24/7 workloads tip the maths against cloud for consumer cards.
  • Electricity is the real running cost: ~$55/yr (3090) to ~$700/yr (H100).
  • Apple Silicon unified memory is a niche but real on-prem lane for local 70B inference.
Best for: steady single-card inference, privacy-sensitive data, local dev loops.
Ship with the product

Edge.

Hailo-10H, Jetson Orin, and Apple Neural Engine run quantised LLMs and vision models under 5–15 W — often the only option for devices that cannot reach a datacenter.

  • Throughput is tiny compared to datacenter, but so is latency and power.
  • Quantisation (INT4/INT8) is mandatory, not optional.
  • Unit economics compound with volume: a chip per device, not per query.
  • See /embedded-ai for the Hailo catalog and on-chip LLM quality benchmarks.
Best for: robotics, phones, factories, anywhere round-trip latency is the enemy.
§ 04 · Consumer

Consumer cards, specified.

The GPUs most ML engineers can actually buy and put in a workstation. The RTX 5090 is the first consumer card with 32 GB; the used 3090 is the first consumer card to stop being expensive.

CardArchVRAMBandwidthFP16TDPPriceStatus
RTX 3090Ampere24 GB GDDR6X936 GB/s35.6350 W~$700–900 usedUsed market
RTX 4090Ada Lovelace24 GB GDDR6X1008 GB/s165.2450 W$1,599 MSRPAvailable
RTX 5080Blackwell16 GB GDDR7960 GB/s112.6360 W$999 MSRPAvailable
RTX 5090Blackwell32 GB GDDR71792 GB/s209.5575 W$1,999 MSRPAvailable
Datacenter register
ChipVRAMFP16$/hr
A100 80GB80 GB HBM2e312~$2.00/hr cloud
H100 SXM80 GB HBM3989~$2.50/hr cloud
H200141 GB HBM3e989~$3.70/hr cloud
MI300X192 GB HBM32,615~$3–5/hr cloud
B200192 GB HBM3e4,500~$6+/hr cloud
Fig 4 · Consumer dense FP16 from NVIDIA datasheets; street prices from major retailers (April 2026). Used 3090 pricing drawn from second-hand listings at the time of publication.
§ 05 · VRAM

How much memory, really.

Weights in FP16 = 2 bytes × parameters. A 7B model takes ~14 GB, a 70B ~140 GB. INT4 quantisation cuts this 4× with small quality loss — which is the only reason a 70B runs on a 4090.

Fine-tuning needs 3–4× more memory for gradients, optimiser state, and activations. 128k-context serving adds tens of GB of KV cache on top.

ModelParamsFP16INT8INT4LoRA FTFits on
Llama 3.1 8B8B16 GB8 GB4 GB24 GBRTX 5080 (16 GB)
Qwen 2.5 14B14B28 GB14 GB7 GB36 GBRTX 5080 (16 GB)
Qwen 2.5 32B32B64 GB32 GB16 GB48 GBRTX 5080 (16 GB)
Llama 3.1 70B70B140 GB70 GB36 GB80 GBA100/H100 (80 GB)
DeepSeek V3671B MoE1.3 TB671 GB336 GBMulti-GPU
Llama 3.1 405B405B810 GB405 GB203 GBMulti-GPU
Fig 5 · Memory budget per model at each precision. "Fits on" is the smallest single card that holds the INT4 weights with headroom for a short context.
§ 06 · Cloud

The hourly spread.

Aggregators undercut hyperscalers on the same silicon by 2–3×. The gap is widest on the H100: $1.65/hr on Vast.ai, $6.88/hr on AWS.

Spot and community instances may be interrupted. Hyperscaler per-GPU equivalents are derived from multi-GPU instance list prices.

ProviderGPUType$/hr$/day$/month (24/7)
Vast.aiRTX 4090spot$0.29$7$212
RunPodRTX 4090community$0.34$8$248
RunPodRTX 4090secure$0.59$14$431
RunPodA100 80GBon-demand$1.99$48$1453
Vast.aiA100 80GBon-demand$2.00$48$1460
Lambda LabsA100 80GBon-demand$2.49$60$1818
Vast.aiH100on-demand$1.65$40$1205
RunPodH100on-demand$2.49$60$1818
Together AIH100on-demand$2.99$72$2183
GCPH200spot$3.72$89$2716
AWSH100on-demand$6.88$165$5022
Fig 6 · Publicly-listed hourly rates, April 2026. Shaded row marks the cheapest verified entry. Prices fluctuate — always reconfirm with the provider before you fill the bucket.
§ 07 · Budget

Under a grand, and the break-even.

Used RTX 3090 remains the best sub-$1k ML card — 24 GB is enough for a 70B at 4-bit. The RTX 5060 is the cheapest new card if VRAM needs stay under 8 GB.

Break-even assumes $0.15/kWh, realistic utilisation during usage hours, and the cheapest available cloud rate at the same specification.

Budget cards · sub-$1k
CardVRAMFP16 TFLOPSTDPPrice$/TFLOPMB/$
RTX 50608 GB19.2145 W$299$15.626.8
RTX 4060 Ti 16GB16 GB22165 W$600$27.326.7
RTX 5070 Ti16 GB55250 W$900$16.417.8
RTX 3090 (used)24 GB35.6350 W$800$22.530.0
Buy vs rent · break-even
ScenarioCardUpfrontElectricity/yrCloud equiv/yrBreak-even
Hobbyist · 20 hrs/wkRTX 3090 used$800~$55$220 (Vast.ai)~5 months
Hobbyist · 20 hrs/wkRTX 5090$1,999~$90$350 (RunPod)~8 months
Startup · 40 hrs/wkRTX 4090$1,599~$140$700 (RunPod)~3 months
Startup · H100 needsH100 80GB~$30,000~$700$5,200 (RunPod)~7 years
Fig 7 · Upfront cards at street price; cloud equivalents at the cheapest verified rate for the same card. At 40 hrs/week on H100, cloud wins for almost every startup — the math only flips when the GPU runs 24/7 sustained.
§ 08 · Apple Silicon

Unified memory, at a cost.

M-series chips trade throughput for envelope: you get 128–192 GB of unified memory on a single quiet workstation, but training is typically 3–5× slower than an equivalent NVIDIA card.

MLX beats PyTorch MPS for Apple-optimised inference. No NVLink-style multi-GPU scaling; not recommended for production training.

Apple Silicon

M4 Max · Mac Studio

Unified memory
Up to 128 GB
Memory bandwidth
546 GB/s
GPU cores
40-core
Price (128 GB)
~$4,000
Single-core (GB6)
~4,054

Runs Llama 3.1 70B at FP16 in 128 GB unified memory — no consumer NVIDIA GPU can match that envelope.

Apple Silicon

M3 Ultra · Mac Studio/Pro

Unified memory
Up to 192 GB
Memory bandwidth
800 GB/s
GPU cores
60/76-core
Price (192 GB)
~$7,000+
Multi-core (GB6)
~28,169

192 GB unified memory fits models that would otherwise need an H100. ~28% faster than M4 Max in sustained workloads.

§ 09 · TOPS

The other unit.

FP16 TFLOPS dominates the datacenter conversation. TOPS — trillions of int operations per second — is what NPU vendors quote and what OEMs design Copilot+ laptops, on-device LLMs, and smart cameras against. Same silicon, different stage of the workload.

Numbers are vendor-published peak figures. Precision matters: INT4 entries flatter the chip vs INT8 peers, and "sparse" roughly doubles dense.


Top entry
NVIDIA H100 SXM · 3,958 TOPS
Tiers
5
Rows
31
Verified
vendor cards · datasheets
Datacenter
6 chips · top 3,958 TOPS
#ChipVendorPowerReleasedSourceTOPS barTOPSPrecision
01NVIDIA H100 SXMNVIDIA700 W2022NVIDIA datasheet
3,958INT8 (sparse)
02NVIDIA H100 SXM (dense)NVIDIA700 W2022NVIDIA datasheet
1,979INT8 (dense)
03Qualcomm Cloud AI 100 UltraQualcomm150 W2024Qualcomm product page
870INT8
04Groq LPU v2Groq~300 W2024Groq specs
750INT8
05NVIDIA L40SNVIDIA350 W2023NVIDIA datasheet
733INT8 (sparse)
06Tenstorrent BlackholeTenstorrent300 W2025Tenstorrent product page
466FP8
Workstation / dev kit
5 chips · top 1,000 TOPS
#ChipVendorPowerReleasedSourceTOPS barTOPSPrecision
01NVIDIA Jetson AGX ThorNVIDIA40–130 W2025NVIDIA Thor
1,000FP4 / INT8 mix
02NVIDIA Jetson AGX Orin (64 GB)NVIDIA15–60 W2022NVIDIA Orin
275INT8 (sparse)
03NVIDIA Jetson Orin NX (16 GB)NVIDIA10–25 W2023NVIDIA Orin
100INT8 (sparse)
04Tesla FSD HW4Tesla~70 W2023Tesla FSD computer
100INT8
05NVIDIA Jetson Orin Nano (8 GB)NVIDIA7–15 W2024-12NVIDIA Orin Nano Super
67INT8 (sparse)
Laptop NPU
6 chips · top 80 TOPS
#ChipVendorPowerReleasedSourceTOPS barTOPSPrecision
01Qualcomm Snapdragon X2 Elite ExtremeQualcommSoC (~25 W)2025-10Qualcomm Snapdragon X2
80INT8
02Intel Panther Lake (Core Ultra 300V)IntelSoC (~17 W)2026Intel Panther Lake
50INT8
03AMD Ryzen AI Max+ Pro 395 (Strix Halo)AMDSoC (~55 W)2025-01AMD Ryzen AI
50INT8
04AMD Ryzen AI 9 HX 370 (XDNA 2)AMDSoC (~28 W)2024-07AMD Ryzen AI 300
50INT8
05Intel Lunar Lake (Core Ultra 200V)IntelSoC (~17 W)2024-09Intel Lunar Lake
48INT8
06Qualcomm Snapdragon X Elite (Gen 1)QualcommSoC (~23 W)2024-06Qualcomm Snapdragon X
45INT8
Mobile
6 chips · top 70 TOPS
#ChipVendorPowerReleasedSourceTOPS barTOPSPrecision
01Qualcomm Snapdragon 8 Elite Gen 2QualcommSoC2025-09Qualcomm Snapdragon 8 Elite
70INT8
02MediaTek Dimensity 9400+MediaTekSoC2025-04MediaTek Dimensity 9400+
55INT8
03Qualcomm QCS8550QualcommSoC2024Qualcomm QCS8550
48INT8
04Apple M5AppleSoC2025-10Apple M5
38INT8
05Apple M4AppleSoC2024-05Apple M4
38INT8
06Apple A18 ProAppleSoC2024-09Apple A18 Pro
35INT8
Edge / embedded
8 chips · top 40 TOPS
#ChipVendorPowerReleasedSourceTOPS barTOPSPrecision
01Hailo-10HHailo~2.5 W2025-07Hailo-10H
40INT4
02Hailo-8Hailo~2.5 W2021Hailo-8
26INT8
03Hailo-15HHailo~3–5 W2023Hailo-15
20INT8
04Hailo-8LHailo~1.5 W2023Hailo-8L
13INT8
05Hailo-15LHailo~2 W2024Hailo-15L
7INT8
06Rockchip RK3588RockchipSoC2022Rockchip RK3588
6INT8
07ARM Ethos-U85ArmIP block2024Arm Ethos-U85
4INT8
08Coral Edge TPUGoogle~2 W2019Coral docs
4INT8
Bar widths share a single scale across all tiers, so an H100's 3,958 TOPS dwarfs an Apple M5's 38. Within a tier, the leader sits first; ties broken alphabetically. INT4 figures (Hailo-10H) trade accuracy for headroom; sparse INT8 (NVIDIA datacenter) doubles dense by skipping zeros.
§ 10 · Edge / embedded

Five watts, and an LLM on chip.

For robotics, phones and factory cameras, the relevant question stops being "how many tokens per second on an H100?" and starts being "how many tokens per second under five watts, offline, with no round-trip to a datacenter?"

Hailo-10H · Jetson Orin · Apple Neural Engine · Qualcomm AI Hub. Quantised LLMs, vision backbones, and on-device transcription — read the register on /embedded-ai.

Embedded AI register
§ 11
Methodology

How we measure silicon.

Most hardware charts cherry-pick a batch size, a quantisation, or a thermal envelope. Ours are built from three ordinary rules.

First, repeatable workloads. Each benchmark names the exact model revision, the exact quantisation (FP16, INT8, INT4 GPTQ/AWQ), the batch size, and the context length. Everything is reproducible from a single container.

Second, matched quantisation. A 5090 at INT4 versus an H100 at FP16 is not a comparison — it's two different models. Cross-card numbers in the same row always run at the same precision.

Third, latency with p95. Throughput is the headline; the footnote is what the slowest one-in-twenty request looks like. We report both, in the same row.

Cloud prices are verified directly with the provider on the date printed at the top of the issue. Vendor TFLOPS are dense FP16 from the published datasheet — not sparse, not "effective," not marketing.

§ 12 · Related

Read next, around the register.

/llm

LLM leaderboard

Frontier LLMs ranked on MMLU, GSM8K, HumanEval — the models these GPUs are serving.

/vision

Vision register

Detection, segmentation, classification — and the cards that run them at production FPS.

/ocr

OCR register

Document OCR benchmarks, CER leaderboard, and the hardware footprint per page.

/speech

Speech register

ASR and TTS leaderboards — Whisper, Parakeet, and the × real-time hardware needs.

/embedded-ai

Embedded AI register

Hailo, Jetson, ANE — LLMs and vision models that run under five watts.

/methodology

Methodology

How every number on Codesota is reproduced, dated, and preserved under regression.

/tasks

Task index

Every ML task in the register, with its canonical benchmark and trust grade.

/browse

Browse benchmarks

The full benchmark catalogue — by area, by modality, by size.