← Hardware · H100 SXM vs H200Head-to-head · matched quantisationIssue: April 22, 2026

Matched precision · same workloads · April 2026

H100 SXM vs H200.

Same Hopper SM count, same 989 FP16 TFLOPS — but the H200 nearly doubles VRAM (80 → 141 GB) and lifts memory bandwidth from 3.35 TB/s to 4.8 TB/s. The compute ceiling is unchanged; the envelope around it grew.

See the numbers →Verdict

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

Spec	H100 SXM	H200
Vendor	NVIDIA	NVIDIA
Tier	Datacenter	Datacenter
Generation	Hopper	Hopper
VRAM	80 GB HBM3	141 GB HBM3e
Bandwidth	3,350 GB/s	4,800 GB/s
FP16 dense	989 TFLOPS	989 TFLOPS
TDP	700 W	700 W
Released	2022	2024
Status	Widely available	Available
Price	~$2.50/hr cloud	~$3.70/hr cloud

Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).

§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

Category	Workload	Metric	H100 SXM	H200	Δ
LLM Inference	Llama 3.1 8B	tok/s	240	280	1.17×
LLM Inference	Llama 3.1 70B · 4-bit	tok/s	65	78	1.20×
LLM Inference	Qwen 2.5 32B · 4-bit	tok/s	80	95	1.19×
LLM Inference	Mistral 7B	tok/s	280	320	1.14×
Image Generation	SDXL 1024×1024	it/s	10.5	11.2	1.07×
Image Generation	Flux.1 Dev	it/s	5.4	5.9	1.09×
Training	Fine-tune Llama 3.1 8B LoRA	samples/s	22	26	1.18×
Training	ResNet-50 · ImageNet	img/s	5,400	5,800	1.07×
Computer Vision	YOLOv8x · inference	FPS	540	580	1.07×
Computer Vision	SAM ViT-H	masks/s	15	16.5	1.10×
Audio/Video	Whisper Large v3	× RT	48	52	1.08×

Fig 3 · Δ column shows H200 ÷ H100 SXM on the same workload. Copper dot marks the winner per row.

§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the H100 SXM

When the H100 SXM wins.

Compute-bound workloads with neat 80 GB models. FP8 training, ResNet-style throughput, anything that already saturates the H100 with the model resident in 80 GB — the extra memory does nothing.

Pick the H200

When the H200 wins.

Memory-bound workloads. 70B in FP16 fits on a single H200; 128k-context KV cache fits with headroom; MoE shards stop spilling. Real-world LLM serving is mostly memory bandwidth-bound, which is where the H200’s 1.4× shows up.

Bottom line. If your worst-case workload is LLM inference, pick H200. If it’s training a model that already fits and saturates an H100, the upgrade buys nothing — just rent more H100s.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h200-vs-b200

Three places to go from here.

Per-chip page

H100 SXM

989 FP16 TFLOPS. The benchmark every other accelerator is measured against.

Per-chip page

H200

141 GB HBM3e. The first datacenter card to fit a 70B in FP16 single-GPU.

Hub

Hardware register

Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.