← Hardware · H100 SXM vs MI300XHead-to-head · matched quantisationIssue: April 22, 2026

Matched precision · same workloads · April 2026

H100 SXM vs MI300X.

NVIDIA's volume datacenter card against AMD's flagship CDNA 3 accelerator. The MI300X brings 2.4× the VRAM and 2.6× the FP16 TFLOPS — at a cloud price that varies more than the silicon spec ever could.

See the numbers →Verdict

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

Spec	H100 SXM	MI300X
Vendor	NVIDIA	AMD
Tier	Datacenter	Datacenter
Generation	Hopper	CDNA 3
VRAM	80 GB HBM3	192 GB HBM3
Bandwidth	3,350 GB/s	5,300 GB/s
FP16 dense	989 TFLOPS	2,615 TFLOPS
TDP	700 W	750 W
Released	2022	2023
Status	Widely available	Available
Price	~$2.50/hr cloud	~$3–5/hr cloud

Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).

§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

Category	Workload	Metric	H100 SXM	MI300X	Δ
LLM Inference	Llama 3.1 8B	tok/s	240	320	1.33×
LLM Inference	Llama 3.1 70B · 4-bit	tok/s	65	95	1.46×
LLM Inference	Qwen 2.5 32B · 4-bit	tok/s	80	115	1.44×
LLM Inference	Mistral 7B	tok/s	280	370	1.32×
Image Generation	SDXL 1024×1024	it/s	10.5	13	1.24×
Image Generation	Flux.1 Dev	it/s	5.4	6.8	1.26×
Training	Fine-tune Llama 3.1 8B LoRA	samples/s	22	30	1.36×
Training	ResNet-50 · ImageNet	img/s	5,400	6,900	1.28×
Computer Vision	YOLOv8x · inference	FPS	540	680	1.26×
Computer Vision	SAM ViT-H	masks/s	15	19	1.27×
Audio/Video	Whisper Large v3	× RT	48	60	1.25×

Fig 3 · Δ column shows MI300X ÷ H100 SXM on the same workload. Copper dot marks the winner per row.

§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the H100 SXM

When the H100 SXM wins.

When your stack assumes CUDA. Custom kernels, NVLink-tight multi-GPU, FP8 transformer engine paths, and any production training pipeline that has been hardened on NVIDIA.

Pick the MI300X

When the MI300X wins.

When VRAM is the constraint and ROCm is fine. 192 GB on a single card means a 70B in FP16 with headroom, or a much larger MoE shard than an H100 holds. Llama and DeepSeek inference stacks on ROCm 6+ are now competitive.

Bottom line. For inference-heavy teams already on ROCm or willing to port, the MI300X price/perf is hard to beat. For training-heavy teams already on CUDA, the H100 stays the safer call.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h200-vs-b200

Three places to go from here.

Per-chip page

H100 SXM

989 FP16 TFLOPS. The benchmark every other accelerator is measured against.

Per-chip page

MI300X

192 GB HBM3, 2,615 FP16 TFLOPS. AMD’s answer to H200 — and it fits more.

Hub

Hardware register

Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.