When the H200 wins.
When you need 141 GB but not 4,500 TFLOPS — long-context inference, MoE serving, and most production LLM workloads cap out on memory bandwidth long before they touch the B200 ceiling.
Hopper's last word against Blackwell's first. The B200 lands 4.5× the FP16 TFLOPS and 1.7× the memory bandwidth — for ~1.6× the cloud hourly. The question is whether your workload actually pulls on either.
Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.
| Spec | H200 | B200 |
|---|---|---|
| Vendor | NVIDIA | NVIDIA |
| Tier | Datacenter | Datacenter |
| Generation | Hopper | Blackwell |
| VRAM | 141 GB HBM3e | 192 GB HBM3e |
| Bandwidth | 4,800 GB/s | 8,000 GB/s |
| FP16 dense | 989 TFLOPS | 4,500 TFLOPS |
| TDP | 700 W | 1000 W |
| Released | 2024 | 2025 |
| Status | Available | Limited |
| Price | ~$3.70/hr cloud | ~$6+/hr cloud |
Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.
Methodology: how we test.
| Category | Workload | Metric | H200 | B200 | Δ |
|---|---|---|---|---|---|
| LLM Inference | Llama 3.1 8B | tok/s | 280 | 540 | 1.93× |
| LLM Inference | Llama 3.1 70B · 4-bit | tok/s | 78 | 165 | 2.12× |
| LLM Inference | Qwen 2.5 32B · 4-bit | tok/s | 95 | 200 | 2.11× |
| LLM Inference | Mistral 7B | tok/s | 320 | 620 | 1.94× |
| Image Generation | SDXL 1024×1024 | it/s | 11.2 | 22 | 1.96× |
| Image Generation | Flux.1 Dev | it/s | 5.9 | 12 | 2.03× |
| Training | Fine-tune Llama 3.1 8B LoRA | samples/s | 26 | 52 | 2.00× |
| Training | ResNet-50 · ImageNet | img/s | 5,800 | 11,500 | 1.98× |
| Computer Vision | YOLOv8x · inference | FPS | 580 | 1,100 | 1.90× |
| Computer Vision | SAM ViT-H | masks/s | 16.5 | 32 | 1.94× |
| Audio/Video | Whisper Large v3 | × RT | 52 | 100 | 1.92× |
The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.
When you need 141 GB but not 4,500 TFLOPS — long-context inference, MoE serving, and most production LLM workloads cap out on memory bandwidth long before they touch the B200 ceiling.
When the step is FLOP-bound: large-batch training, dense matmul kernels, and ResNet-style throughput. B200 also wins outright on availability of FP4/FP8 paths if your stack is current.
Bottom line. For most teams in 2026 the H200 is the right answer at $3.70/hr — B200 only pays back when your training is FLOP-saturated and your scheduler keeps it that way.
Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.
Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.
The biggest consumer card vs a real datacenter accelerator. When does the 5090 actually catch up?