← Hardware · H200 vs B200Head-to-head · matched quantisationIssue: April 22, 2026
Matched precision · same workloads · April 2026

H200 vs B200.

Hopper's last word against Blackwell's first. The B200 lands 4.5× the FP16 TFLOPS and 1.7× the memory bandwidth — for ~1.6× the cloud hourly. The question is whether your workload actually pulls on either.

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

SpecH200B200
VendorNVIDIANVIDIA
TierDatacenterDatacenter
GenerationHopperBlackwell
VRAM141 GB HBM3e192 GB HBM3e
Bandwidth4,800 GB/s8,000 GB/s
FP16 dense989 TFLOPS4,500 TFLOPS
TDP700 W1000 W
Released20242025
StatusAvailableLimited
Price~$3.70/hr cloud~$6+/hr cloud
Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).
§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

CategoryWorkloadMetricH200B200Δ
LLM InferenceLlama 3.1 8Btok/s2805401.93×
LLM InferenceLlama 3.1 70B · 4-bittok/s781652.12×
LLM InferenceQwen 2.5 32B · 4-bittok/s952002.11×
LLM InferenceMistral 7Btok/s3206201.94×
Image GenerationSDXL 1024×1024it/s11.2221.96×
Image GenerationFlux.1 Devit/s5.9122.03×
TrainingFine-tune Llama 3.1 8B LoRAsamples/s26522.00×
TrainingResNet-50 · ImageNetimg/s5,80011,5001.98×
Computer VisionYOLOv8x · inferenceFPS5801,1001.90×
Computer VisionSAM ViT-Hmasks/s16.5321.94×
Audio/VideoWhisper Large v3× RT521001.92×
Fig 3 · Δ column shows B200 ÷ H200 on the same workload. Copper dot marks the winner per row.
§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the H200

When the H200 wins.

When you need 141 GB but not 4,500 TFLOPS — long-context inference, MoE serving, and most production LLM workloads cap out on memory bandwidth long before they touch the B200 ceiling.

Pick the B200

When the B200 wins.

When the step is FLOP-bound: large-batch training, dense matmul kernels, and ResNet-style throughput. B200 also wins outright on availability of FP4/FP8 paths if your stack is current.

Bottom line. For most teams in 2026 the H200 is the right answer at $3.70/hr — B200 only pays back when your training is FLOP-saturated and your scheduler keeps it that way.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h100-vs-h200

H100 SXM vs H200

Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.

/hardware/rtx-5090-vs-rtx-4090

RTX 5090 vs RTX 4090

Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.

/hardware/rtx-5090-vs-h200

RTX 5090 vs H200

The biggest consumer card vs a real datacenter accelerator. When does the 5090 actually catch up?

Read next

Three places to go from here.

Per-chip page
H200
141 GB HBM3e. The first datacenter card to fit a 70B in FP16 single-GPU.
Per-chip page
B200
4,500 FP16 TFLOPS. The current top of the leaderboard, when you can find one.
Hub
Hardware register
Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.