← Hardware · H200 vs B200Head-to-head · matched quantisationIssue: April 22, 2026

Matched precision · same workloads · April 2026

H200 vs B200.

Hopper's last word against Blackwell's first. The B200 lands 4.5× the FP16 TFLOPS and 1.7× the memory bandwidth — for ~1.6× the cloud hourly. The question is whether your workload actually pulls on either.

See the numbers →Verdict

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

Spec	H200	B200
Vendor	NVIDIA	NVIDIA
Tier	Datacenter	Datacenter
Generation	Hopper	Blackwell
VRAM	141 GB HBM3e	192 GB HBM3e
Bandwidth	4,800 GB/s	8,000 GB/s
FP16 dense	989 TFLOPS	4,500 TFLOPS
TDP	700 W	1000 W
Released	2024	2025
Status	Available	Limited
Price	~$3.70/hr cloud	~$6+/hr cloud

Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).

§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

Category	Workload	Metric	H200	B200	Δ
LLM Inference	Llama 3.1 8B	tok/s	280	540	1.93×
LLM Inference	Llama 3.1 70B · 4-bit	tok/s	78	165	2.12×
LLM Inference	Qwen 2.5 32B · 4-bit	tok/s	95	200	2.11×
LLM Inference	Mistral 7B	tok/s	320	620	1.94×
Image Generation	SDXL 1024×1024	it/s	11.2	22	1.96×
Image Generation	Flux.1 Dev	it/s	5.9	12	2.03×
Training	Fine-tune Llama 3.1 8B LoRA	samples/s	26	52	2.00×
Training	ResNet-50 · ImageNet	img/s	5,800	11,500	1.98×
Computer Vision	YOLOv8x · inference	FPS	580	1,100	1.90×
Computer Vision	SAM ViT-H	masks/s	16.5	32	1.94×
Audio/Video	Whisper Large v3	× RT	52	100	1.92×

Fig 3 · Δ column shows B200 ÷ H200 on the same workload. Copper dot marks the winner per row.

§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the H200

When the H200 wins.

When you need 141 GB but not 4,500 TFLOPS — long-context inference, MoE serving, and most production LLM workloads cap out on memory bandwidth long before they touch the B200 ceiling.

Pick the B200

When the B200 wins.

When the step is FLOP-bound: large-batch training, dense matmul kernels, and ResNet-style throughput. B200 also wins outright on availability of FP4/FP8 paths if your stack is current.

Bottom line. For most teams in 2026 the H200 is the right answer at $3.70/hr — B200 only pays back when your training is FLOP-saturated and your scheduler keeps it that way.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h100-vs-h200

Three places to go from here.

Per-chip page

H200

141 GB HBM3e. The first datacenter card to fit a 70B in FP16 single-GPU.

Per-chip page

B200

4,500 FP16 TFLOPS. The current top of the leaderboard, when you can find one.

Hub

Hardware register

Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.