← Hardware · RTX 5090 vs H200Head-to-head · matched quantisationIssue: April 22, 2026

Matched precision · same workloads · April 2026

RTX 5090 vs H200.

A $1,999 card on your desk against a $3.70/hr datacenter rental. The H200 has 4.8× the FP16, 4.4× the VRAM, and 2.7× the memory bandwidth — but the 5090 wins on $/token-served the moment your workload fits in 32 GB.

See the numbers →Verdict

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

Spec	RTX 5090	H200
Vendor	NVIDIA	NVIDIA
Tier	Consumer	Datacenter
Generation	Blackwell	Hopper
VRAM	32 GB GDDR7	141 GB HBM3e
Bandwidth	1,792 GB/s	4,800 GB/s
FP16 dense	209.5 TFLOPS	989 TFLOPS
TDP	575 W	700 W
Released	2025	2024
Status	Available	Available
Price	$1,999 MSRP	~$3.70/hr cloud

Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).

§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

Category	Workload	Metric	RTX 5090	H200	Δ
LLM Inference	Llama 3.1 8B	tok/s	140	280	2.00×
LLM Inference	Llama 3.1 70B · 4-bit	tok/s	38	78	2.05×
LLM Inference	Qwen 2.5 32B · 4-bit	tok/s	48	95	1.98×
LLM Inference	Mistral 7B	tok/s	165	320	1.94×
Image Generation	SDXL 1024×1024	it/s	6.5	11.2	1.72×
Image Generation	Flux.1 Dev	it/s	3.4	5.9	1.74×
Training	Fine-tune Llama 3.1 8B LoRA	samples/s	12.5	26	2.08×
Training	ResNet-50 · ImageNet	img/s	2,800	5,800	2.07×
Computer Vision	YOLOv8x · inference	FPS	320	580	1.81×
Computer Vision	SAM ViT-H	masks/s	9.2	16.5	1.79×
Audio/Video	Whisper Large v3	× RT	28	52	1.86×

Fig 3 · Δ column shows H200 ÷ RTX 5090 on the same workload. Copper dot marks the winner per row.

§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the RTX 5090

When the RTX 5090 wins.

Anything that fits in 32 GB and runs sustained. Local 70B INT4 inference, SDXL pipelines, smaller LoRA training — break-even on the upfront card vs cloud is under 12 months at 20 hrs/wk.

Pick the H200

When the H200 wins.

Models that don’t fit, batches that don’t finish. 70B in FP16, 128k-context serving, anything FP8-trained at scale. The H200 also wins outright when you need elasticity — burst a hundred GPUs for an afternoon.

Bottom line. Buy the 5090 for steady local workloads under 32 GB; rent H200 for the bursts and the bigger models. The two stack — they don’t replace each other.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h200-vs-b200

Three places to go from here.

Per-chip page

RTX 5090

First consumer card with 32 GB. The ceiling for a single-PSU workstation.

Per-chip page

H200

141 GB HBM3e. The first datacenter card to fit a 70B in FP16 single-GPU.

Hub

Hardware register

Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.