← Hardware · MI300XAMD · Datacenter · CDNA 3Issue: April 22, 2026
Datacenter · CDNA 3 · released 2023

MI300X. Specs, benchmarks, $/hr.

AMD's flagship CDNA 3 accelerator. 192 GB HBM3 at 5.3 TB/s — the same VRAM envelope as a B200 — and 2,615 FP16 TFLOPS, all on ROCm. The price/perf story holds when ROCm holds.

§ 01 · Specs

MI300X, specified.

Dense FP16 from the AMD datasheet. Bandwidth is peak; sustained will be lower. Price reflects cheapest verified hourly as of the date stamped at the top.

Architectural lineage
FP16 TFLOPS over recent AMD generations.
VendorAMD
TierDatacenter
GenerationCDNA 3
VRAM192 GB · HBM3
Memory bandwidth5,300 GB/s
FP16 dense2,615 TFLOPS
TDP750 W
Released2023
Price~$3–5/hr cloud
StatusAvailable
Fig 1 · Single-card spec sheet. FP16 is dense (not sparse). Bandwidth is peak HBM/GDDR.
§ 02 · Benchmarks

Eleven workloads, one card.

Throughput on the same set of repeatable workloads we use across the register. Same quantisation across cards in the same row; latency reported with p95 in the methodology notes.

Numbers without a measurement on this chip are marked "—". Cross-card comparisons live on the head-to-head pages.

CategoryWorkloadMetricMI300XNotes
LLM InferenceLlama 3.1 8Btok/s320tokens per second · single-stream · FP16
LLM InferenceLlama 3.1 70B · 4-bittok/s95tokens per second · single-stream · INT4 GPTQ
LLM InferenceQwen 2.5 32B · 4-bittok/s115tokens per second · single-stream · INT4
LLM InferenceMistral 7Btok/s370tokens per second · single-stream · FP16
Image GenerationSDXL 1024×1024it/s13iterations per second · 30 steps · FP16
Image GenerationFlux.1 Devit/s6.8iterations per second · 28 steps · FP16
TrainingFine-tune Llama 3.1 8B LoRAsamples/s30samples per second · seq 2k · BF16
TrainingResNet-50 · ImageNetimg/s6,900images per second · BS=256 · BF16
Computer VisionYOLOv8x · inferenceFPS680frames per second · BS=1 · FP16
Computer VisionSAM ViT-Hmasks/s19masks per second · 1024×1024 · FP16
Audio/VideoWhisper Large v3× RT60multiples of real-time · CPU offload off
Fig 2 · Per-workload throughput on a single MI300X. Higher is better unless the metric is a price.
§ 03 · VRAM fit

What fits in 192 GB, really.

FP16 weights = 2 bytes × parameters. INT4 cuts that 4× with small quality loss. Fine-tuning needs 3–4× more memory for gradients, optimiser, activations.

ModelParamsFP16INT8INT4Fits on MI300X?
Llama 3.1 8B8B16 GB8 GB4 GBFP16, INT8 and INT4
Qwen 2.5 14B14B28 GB14 GB7 GBFP16, INT8 and INT4
Qwen 2.5 32B32B64 GB32 GB16 GBFP16, INT8 and INT4
Llama 3.1 70B70B140 GB70 GB36 GBFP16, INT8 and INT4
DeepSeek V3671B MoE1.3 TB671 GB336 GBNo
Llama 3.1 405B405B810 GB405 GB203 GBNo
Fig 3 · Memory budget per model at each precision against this card's 192 GB envelope.
§ 04 · Compare

MI300X head-to-heads.

Side-by-side spec tables and matched-quantisation throughput numbers for the comparisons people actually search for.
/hardware/h100-vs-mi300x

H100 SXM vs MI300X

NVIDIA vs AMD at the datacenter. 80 GB HBM3 vs 192 GB HBM3, 989 FP16 TFLOPS vs 2,615.

Read next

Three places to go from here.

Hub
Hardware register
Every accelerator on the leaderboard, with FP16 TFLOPS, VRAM, $/hr, and energy cost in one place.
Per-chip page
RTX 5090
First consumer card with 32 GB. The ceiling for a single-PSU workstation.
Per-chip page
RTX 4090
Still the workhorse: 24 GB GDDR6X, $0.29/hr on Vast.ai spot.