← Hardware · H100 SXM vs MI300XHead-to-head · matched quantisationIssue: April 22, 2026
Matched precision · same workloads · April 2026

H100 SXM vs MI300X.

NVIDIA's volume datacenter card against AMD's flagship CDNA 3 accelerator. The MI300X brings 2.4× the VRAM and 2.6× the FP16 TFLOPS — at a cloud price that varies more than the silicon spec ever could.

§ 01 · Specs

Side by side, on paper.

Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.

SpecH100 SXMMI300X
VendorNVIDIAAMD
TierDatacenterDatacenter
GenerationHopperCDNA 3
VRAM80 GB HBM3192 GB HBM3
Bandwidth3,350 GB/s5,300 GB/s
FP16 dense989 TFLOPS2,615 TFLOPS
TDP700 W750 W
Released20222023
StatusWidely availableAvailable
Price~$2.50/hr cloud~$3–5/hr cloud
Fig 2 · Spec deltas. Copper dot marks the column with the bigger number for that axis (lower W is better; otherwise higher).
§ 02 · Benchmarks

On real workloads.

Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.

Methodology: how we test.

CategoryWorkloadMetricH100 SXMMI300XΔ
LLM InferenceLlama 3.1 8Btok/s2403201.33×
LLM InferenceLlama 3.1 70B · 4-bittok/s65951.46×
LLM InferenceQwen 2.5 32B · 4-bittok/s801151.44×
LLM InferenceMistral 7Btok/s2803701.32×
Image GenerationSDXL 1024×1024it/s10.5131.24×
Image GenerationFlux.1 Devit/s5.46.81.26×
TrainingFine-tune Llama 3.1 8B LoRAsamples/s22301.36×
TrainingResNet-50 · ImageNetimg/s5,4006,9001.28×
Computer VisionYOLOv8x · inferenceFPS5406801.26×
Computer VisionSAM ViT-Hmasks/s15191.27×
Audio/VideoWhisper Large v3× RT48601.25×
Fig 3 · Δ column shows MI300X ÷ H100 SXM on the same workload. Copper dot marks the winner per row.
§ 03 · Verdict

When each one wins.

The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.

Pick the H100 SXM

When the H100 SXM wins.

When your stack assumes CUDA. Custom kernels, NVLink-tight multi-GPU, FP8 transformer engine paths, and any production training pipeline that has been hardened on NVIDIA.

Pick the MI300X

When the MI300X wins.

When VRAM is the constraint and ROCm is fine. 192 GB on a single card means a 70B in FP16 with headroom, or a much larger MoE shard than an H100 holds. Llama and DeepSeek inference stacks on ROCm 6+ are now competitive.

Bottom line. For inference-heavy teams already on ROCm or willing to port, the MI300X price/perf is hard to beat. For training-heavy teams already on CUDA, the H100 stays the safer call.

§ 04 · More head-to-heads

Other matchups, same format.

/hardware/h200-vs-b200

H200 vs B200

Hopper’s last word against Blackwell’s first. 4.5× the FP16, almost 50% more VRAM bandwidth.

/hardware/h100-vs-h200

H100 SXM vs H200

Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.

/hardware/rtx-5090-vs-rtx-4090

RTX 5090 vs RTX 4090

Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.

Read next

Three places to go from here.

Per-chip page
H100 SXM
989 FP16 TFLOPS. The benchmark every other accelerator is measured against.
Per-chip page
MI300X
192 GB HBM3, 2,615 FP16 TFLOPS. AMD’s answer to H200 — and it fits more.
Hub
Hardware register
Every accelerator on the leaderboard with FP16 TFLOPS, VRAM, $/hr, and energy cost.