← Hardware · H200NVIDIA · Datacenter · HopperIssue: April 22, 2026
Datacenter · Hopper · released 2024

H200. Specs, benchmarks, $/hr.

Same compute as an H100, almost double the memory: 141 GB HBM3e at 4.8 TB/s. The first datacenter card to fit a 70B in full FP16 on a single GPU, and the cheapest path to long-context serving on Hopper.

§ 01 · Specs

H200, specified.

Dense FP16 from the NVIDIA datasheet. Bandwidth is peak; sustained will be lower. Price reflects cheapest verified hourly as of the date stamped at the top.

Architectural lineage
FP16 TFLOPS over recent NVIDIA generations.
VendorNVIDIA
TierDatacenter
GenerationHopper
VRAM141 GB · HBM3e
Memory bandwidth4,800 GB/s
FP16 dense989 TFLOPS
TDP700 W
Released2024
Price~$3.70/hr cloud
StatusAvailable
Fig 1 · Single-card spec sheet. FP16 is dense (not sparse). Bandwidth is peak HBM/GDDR.
§ 02 · Benchmarks

Eleven workloads, one card.

Throughput on the same set of repeatable workloads we use across the register. Same quantisation across cards in the same row; latency reported with p95 in the methodology notes.

Numbers without a measurement on this chip are marked "—". Cross-card comparisons live on the head-to-head pages.

CategoryWorkloadMetricH200Notes
LLM InferenceLlama 3.1 8Btok/s280tokens per second · single-stream · FP16
LLM InferenceLlama 3.1 70B · 4-bittok/s78tokens per second · single-stream · INT4 GPTQ
LLM InferenceQwen 2.5 32B · 4-bittok/s95tokens per second · single-stream · INT4
LLM InferenceMistral 7Btok/s320tokens per second · single-stream · FP16
Image GenerationSDXL 1024×1024it/s11.2iterations per second · 30 steps · FP16
Image GenerationFlux.1 Devit/s5.9iterations per second · 28 steps · FP16
TrainingFine-tune Llama 3.1 8B LoRAsamples/s26samples per second · seq 2k · BF16
TrainingResNet-50 · ImageNetimg/s5,800images per second · BS=256 · BF16
Computer VisionYOLOv8x · inferenceFPS580frames per second · BS=1 · FP16
Computer VisionSAM ViT-Hmasks/s16.5masks per second · 1024×1024 · FP16
Audio/VideoWhisper Large v3× RT52multiples of real-time · CPU offload off
Fig 2 · Per-workload throughput on a single H200. Higher is better unless the metric is a price.
§ 03 · VRAM fit

What fits in 141 GB, really.

FP16 weights = 2 bytes × parameters. INT4 cuts that 4× with small quality loss. Fine-tuning needs 3–4× more memory for gradients, optimiser, activations.

ModelParamsFP16INT8INT4Fits on H200?
Llama 3.1 8B8B16 GB8 GB4 GBFP16, INT8 and INT4
Qwen 2.5 14B14B28 GB14 GB7 GBFP16, INT8 and INT4
Qwen 2.5 32B32B64 GB32 GB16 GBFP16, INT8 and INT4
Llama 3.1 70B70B140 GB70 GB36 GBFP16, INT8 and INT4
DeepSeek V3671B MoE1.3 TB671 GB336 GBNo
Llama 3.1 405B405B810 GB405 GB203 GBNo
Fig 3 · Memory budget per model at each precision against this card's 141 GB envelope.
§ 04 · Compare

H200 head-to-heads.

Side-by-side spec tables and matched-quantisation throughput numbers for the comparisons people actually search for.
/hardware/h200-vs-b200

H200 vs B200

Hopper’s last word against Blackwell’s first. 4.5× the FP16, almost 50% more VRAM bandwidth.

/hardware/h100-vs-h200

H100 SXM vs H200

Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.

/hardware/rtx-5090-vs-h200

RTX 5090 vs H200

The biggest consumer card vs a real datacenter accelerator. When does the 5090 actually catch up?

Read next

Three places to go from here.

Hub
Hardware register
Every accelerator on the leaderboard, with FP16 TFLOPS, VRAM, $/hr, and energy cost in one place.
Per-chip page
RTX 5090
First consumer card with 32 GB. The ceiling for a single-PSU workstation.
Per-chip page
RTX 4090
Still the workhorse: 24 GB GDDR6X, $0.29/hr on Vast.ai spot.