← Hardware · RTX 5090 vs H200Head-to-head · matched quantisationIssue: April 22, 2026
Matched precision · same workloads · April 2026

RTX 5090 vs H200.

A $1,999 card on your desk against a $3.70/hr datacenter rental. The H200 has 4.8× the FP16, 4.4× the VRAM, and 2.7× the memory bandwidth — but the 5090 wins on $/token-served the moment your workload fits in 32 GB.

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

SpecRTX 5090H200
VendorNVIDIANVIDIA
TierConsumerDatacenter
GenerationBlackwellHopper
VRAM32 GB GDDR7141 GB HBM3e
Bandwidth1,792 GB/s4,800 GB/s
FP16 dense209.5 TFLOPS989 TFLOPS
TDP575 W700 W
Released20252024
StatusAvailableAvailable
Price$1,999 MSRP~$3.70/hr cloud
Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).
§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

CategoryWorkloadMetricRTX 5090H200Δ
LLM InferenceLlama 3.1 8Btok/s1402802.00×
LLM InferenceLlama 3.1 70B · 4-bittok/s38782.05×
LLM InferenceQwen 2.5 32B · 4-bittok/s48951.98×
LLM InferenceMistral 7Btok/s1653201.94×
Image GenerationSDXL 1024×1024it/s6.511.21.72×
Image GenerationFlux.1 Devit/s3.45.91.74×
TrainingFine-tune Llama 3.1 8B LoRAsamples/s12.5262.08×
TrainingResNet-50 · ImageNetimg/s2,8005,8002.07×
Computer VisionYOLOv8x · inferenceFPS3205801.81×
Computer VisionSAM ViT-Hmasks/s9.216.51.79×
Audio/VideoWhisper Large v3× RT28521.86×
Fig 3 · Δ column shows H200 ÷ RTX 5090 on the same workload. Copper dot marks the winner per row.
§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the RTX 5090

When the RTX 5090 wins.

Anything that fits in 32 GB and runs sustained. Local 70B INT4 inference, SDXL pipelines, smaller LoRA training — break-even on the upfront card vs cloud is under 12 months at 20 hrs/wk.

Pick the H200

When the H200 wins.

Models that don’t fit, batches that don’t finish. 70B in FP16, 128k-context serving, anything FP8-trained at scale. The H200 also wins outright when you need elasticity — burst a hundred GPUs for an afternoon.

Bottom line. Buy the 5090 for steady local workloads under 32 GB; rent H200 for the bursts and the bigger models. The two stack — they don’t replace each other.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h200-vs-b200

H200 vs B200

Hopper’s last word against Blackwell’s first. 4.5× the FP16, almost 50% more VRAM bandwidth.

/hardware/h100-vs-h200

H100 SXM vs H200

Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.

/hardware/rtx-5090-vs-rtx-4090

RTX 5090 vs RTX 4090

Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.

Read next

Three places to go from here.

Per-chip page
RTX 5090
First consumer card with 32 GB. The ceiling for a single-PSU workstation.
Per-chip page
H200
141 GB HBM3e. The first datacenter card to fit a 70B in FP16 single-GPU.
Hub
Hardware register
Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.