← Hardware · H100 SXMNVIDIA · Datacenter · HopperIssue: April 22, 2026
Datacenter · Hopper · released 2022

H100 SXM. Specs, benchmarks, $/hr.

The benchmark every other datacenter card is measured against. 989 FP16 TFLOPS, 80 GB HBM3, 3.35 TB/s memory bandwidth — and a 4× cloud price spread between Vast.ai ($1.65/hr) and AWS ($6.88/hr) for the same silicon.

§ 01 · Specs

H100 SXM, specified.

Dense FP16 from the NVIDIA datasheet. Bandwidth is peak; sustained will be lower. Price reflects cheapest verified hourly as of the date stamped at the top.

Architectural lineage
FP16 TFLOPS over recent NVIDIA generations.
VendorNVIDIA
TierDatacenter
GenerationHopper
VRAM80 GB · HBM3
Memory bandwidth3,350 GB/s
FP16 dense989 TFLOPS
TDP700 W
Released2022
Price~$2.50/hr cloud
StatusWidely available
Fig 1 · Single-card spec sheet. FP16 is dense (not sparse). Bandwidth is peak HBM/GDDR.
§ 02 · Benchmarks

Eleven workloads, one card.

Throughput on the same set of repeatable workloads we use across the register. Same quantisation across cards in the same row; latency reported with p95 in the methodology notes.

Numbers without a measurement on this chip are marked "—". Cross-card comparisons live on the head-to-head pages.

CategoryWorkloadMetricH100 SXMNotes
LLM InferenceLlama 3.1 8Btok/s240tokens per second · single-stream · FP16
LLM InferenceLlama 3.1 70B · 4-bittok/s65tokens per second · single-stream · INT4 GPTQ
LLM InferenceQwen 2.5 32B · 4-bittok/s80tokens per second · single-stream · INT4
LLM InferenceMistral 7Btok/s280tokens per second · single-stream · FP16
Image GenerationSDXL 1024×1024it/s10.5iterations per second · 30 steps · FP16
Image GenerationFlux.1 Devit/s5.4iterations per second · 28 steps · FP16
TrainingFine-tune Llama 3.1 8B LoRAsamples/s22samples per second · seq 2k · BF16
TrainingResNet-50 · ImageNetimg/s5,400images per second · BS=256 · BF16
Computer VisionYOLOv8x · inferenceFPS540frames per second · BS=1 · FP16
Computer VisionSAM ViT-Hmasks/s15masks per second · 1024×1024 · FP16
Audio/VideoWhisper Large v3× RT48multiples of real-time · CPU offload off
Fig 2 · Per-workload throughput on a single H100 SXM. Higher is better unless the metric is a price.
§ 03 · VRAM fit

What fits in 80 GB, really.

FP16 weights = 2 bytes × parameters. INT4 cuts that 4× with small quality loss. Fine-tuning needs 3–4× more memory for gradients, optimiser, activations.

ModelParamsFP16INT8INT4Fits on H100 SXM?
Llama 3.1 8B8B16 GB8 GB4 GBFP16, INT8 and INT4
Qwen 2.5 14B14B28 GB14 GB7 GBFP16, INT8 and INT4
Qwen 2.5 32B32B64 GB32 GB16 GBFP16, INT8 and INT4
Llama 3.1 70B70B140 GB70 GB36 GBINT8 and INT4 only
DeepSeek V3671B MoE1.3 TB671 GB336 GBNo
Llama 3.1 405B405B810 GB405 GB203 GBNo
Fig 3 · Memory budget per model at each precision against this card's 80 GB envelope.
§ 04 · Compare

H100 SXM head-to-heads.

Side-by-side spec tables and matched-quantisation throughput numbers for the comparisons people actually search for.
/hardware/h100-vs-h200

H100 SXM vs H200

Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.

/hardware/h100-vs-mi300x

H100 SXM vs MI300X

NVIDIA vs AMD at the datacenter. 80 GB HBM3 vs 192 GB HBM3, 989 FP16 TFLOPS vs 2,615.

Read next

Three places to go from here.

Hub
Hardware register
Every accelerator on the leaderboard, with FP16 TFLOPS, VRAM, $/hr, and energy cost in one place.
Per-chip page
RTX 5090
First consumer card with 32 GB. The ceiling for a single-PSU workstation.
Per-chip page
RTX 4090
Still the workhorse: 24 GB GDDR6X, $0.29/hr on Vast.ai spot.