H100 SXM. Specs, benchmarks, $/hr.

The benchmark every other datacenter card is measured against. 989 FP16 TFLOPS, 80 GB HBM3, 3.35 TB/s memory bandwidth — and a 4× cloud price spread between Vast.ai ($1.65/hr) and AWS ($6.88/hr) for the same silicon.

Read the benchmarks →Head-to-headsVerified April 2026

§ 01 · Specs

H100 SXM, specified.

Dense FP16 from the NVIDIA datasheet. Bandwidth is peak; sustained will be lower. Price reflects cheapest verified hourly as of the date stamped at the top.

Architectural lineage

FP16 TFLOPS over recent NVIDIA generations.

Vendor	NVIDIA
Tier	Datacenter
Generation	Hopper
VRAM	80 GB · HBM3
Memory bandwidth	3,350 GB/s
FP16 dense	989 TFLOPS
TDP	700 W
Released	2022
Price	~$2.50/hr cloud
Status	Widely available

Fig 1 · Single-card spec sheet. FP16 is dense (not sparse). Bandwidth is peak HBM/GDDR.

§ 02 · Benchmarks

Eleven workloads, one card.

Throughput on the same set of repeatable workloads we use across the register. Same quantisation across cards in the same row; latency reported with p95 in the methodology notes.

Numbers without a measurement on this chip are marked "—". Cross-card comparisons live on the head-to-head pages.

Category	Workload	Metric	H100 SXM	Notes
LLM Inference	Llama 3.1 8B	tok/s	240	tokens per second · single-stream · FP16
LLM Inference	Llama 3.1 70B · 4-bit	tok/s	65	tokens per second · single-stream · INT4 GPTQ
LLM Inference	Qwen 2.5 32B · 4-bit	tok/s	80	tokens per second · single-stream · INT4
LLM Inference	Mistral 7B	tok/s	280	tokens per second · single-stream · FP16
Image Generation	SDXL 1024×1024	it/s	10.5	iterations per second · 30 steps · FP16
Image Generation	Flux.1 Dev	it/s	5.4	iterations per second · 28 steps · FP16
Training	Fine-tune Llama 3.1 8B LoRA	samples/s	22	samples per second · seq 2k · BF16
Training	ResNet-50 · ImageNet	img/s	5,400	images per second · BS=256 · BF16
Computer Vision	YOLOv8x · inference	FPS	540	frames per second · BS=1 · FP16
Computer Vision	SAM ViT-H	masks/s	15	masks per second · 1024×1024 · FP16
Audio/Video	Whisper Large v3	× RT	48	multiples of real-time · CPU offload off

Fig 2 · Per-workload throughput on a single H100 SXM. Higher is better unless the metric is a price.

§ 03 · VRAM fit

What fits in 80 GB, really.

FP16 weights = 2 bytes × parameters. INT4 cuts that 4× with small quality loss. Fine-tuning needs 3–4× more memory for gradients, optimiser, activations.

Model	Params	FP16	INT8	INT4	Fits on H100 SXM?
Llama 3.1 8B	8B	16 GB	8 GB	4 GB	FP16, INT8 and INT4
Qwen 2.5 14B	14B	28 GB	14 GB	7 GB	FP16, INT8 and INT4
Qwen 2.5 32B	32B	64 GB	32 GB	16 GB	FP16, INT8 and INT4
Llama 3.1 70B	70B	140 GB	70 GB	36 GB	INT8 and INT4 only
DeepSeek V3	671B MoE	1.3 TB	671 GB	336 GB	No
Llama 3.1 405B	405B	810 GB	405 GB	203 GB	No

Fig 3 · Memory budget per model at each precision against this card's 80 GB envelope.

§ 04 · Compare

H100 SXM head-to-heads.

Side-by-side spec tables and matched-quantisation throughput numbers for the comparisons people actually search for.

/hardware/h100-vs-h200

H100 SXM vs H200 →

Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.

/hardware/h100-vs-mi300x

H100 SXM vs MI300X →

NVIDIA vs AMD at the datacenter. 80 GB HBM3 vs 192 GB HBM3, 989 FP16 TFLOPS vs 2,615.

Three places to go from here.

Hub

Hardware register

Every accelerator on the leaderboard, with FP16 TFLOPS, VRAM, $/hr, and energy cost in one place.

Per-chip page

RTX 5090

First consumer card with 32 GB. The ceiling for a single-PSU workstation.

Per-chip page

RTX 4090

Still the workhorse: 24 GB GDDR6X, $0.29/hr on Vast.ai spot.