← Hardware · H100 SXM vs H200Head-to-head · matched quantisationIssue: April 22, 2026
Matched precision · same workloads · April 2026

H100 SXM vs H200.

Same Hopper SM count, same 989 FP16 TFLOPS — but the H200 nearly doubles VRAM (80 → 141 GB) and lifts memory bandwidth from 3.35 TB/s to 4.8 TB/s. The compute ceiling is unchanged; the envelope around it grew.

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

SpecH100 SXMH200
VendorNVIDIANVIDIA
TierDatacenterDatacenter
GenerationHopperHopper
VRAM80 GB HBM3141 GB HBM3e
Bandwidth3,350 GB/s4,800 GB/s
FP16 dense989 TFLOPS989 TFLOPS
TDP700 W700 W
Released20222024
StatusWidely availableAvailable
Price~$2.50/hr cloud~$3.70/hr cloud
Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).
§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

CategoryWorkloadMetricH100 SXMH200Δ
LLM InferenceLlama 3.1 8Btok/s2402801.17×
LLM InferenceLlama 3.1 70B · 4-bittok/s65781.20×
LLM InferenceQwen 2.5 32B · 4-bittok/s80951.19×
LLM InferenceMistral 7Btok/s2803201.14×
Image GenerationSDXL 1024×1024it/s10.511.21.07×
Image GenerationFlux.1 Devit/s5.45.91.09×
TrainingFine-tune Llama 3.1 8B LoRAsamples/s22261.18×
TrainingResNet-50 · ImageNetimg/s5,4005,8001.07×
Computer VisionYOLOv8x · inferenceFPS5405801.07×
Computer VisionSAM ViT-Hmasks/s1516.51.10×
Audio/VideoWhisper Large v3× RT48521.08×
Fig 3 · Δ column shows H200 ÷ H100 SXM on the same workload. Copper dot marks the winner per row.
§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the H100 SXM

When the H100 SXM wins.

Compute-bound workloads with neat 80 GB models. FP8 training, ResNet-style throughput, anything that already saturates the H100 with the model resident in 80 GB — the extra memory does nothing.

Pick the H200

When the H200 wins.

Memory-bound workloads. 70B in FP16 fits on a single H200; 128k-context KV cache fits with headroom; MoE shards stop spilling. Real-world LLM serving is mostly memory bandwidth-bound, which is where the H200’s 1.4× shows up.

Bottom line. If your worst-case workload is LLM inference, pick H200. If it’s training a model that already fits and saturates an H100, the upgrade buys nothing — just rent more H100s.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h200-vs-b200

H200 vs B200

Hopper’s last word against Blackwell’s first. 4.5× the FP16, almost 50% more VRAM bandwidth.

/hardware/rtx-5090-vs-rtx-4090

RTX 5090 vs RTX 4090

Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.

/hardware/rtx-5090-vs-h200

RTX 5090 vs H200

The biggest consumer card vs a real datacenter accelerator. When does the 5090 actually catch up?

Read next

Three places to go from here.

Per-chip page
H100 SXM
989 FP16 TFLOPS. The benchmark every other accelerator is measured against.
Per-chip page
H200
141 GB HBM3e. The first datacenter card to fit a 70B in FP16 single-GPU.
Hub
Hardware register
Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.