← Hardware · RTX 5090NVIDIA · Consumer · BlackwellIssue: April 22, 2026
Consumer · Blackwell · released 2025

RTX 5090. Specs, benchmarks, $/hr.

The first consumer NVIDIA card with 32 GB of VRAM. 1.27× the FP16 of a 4090, 1.78× the bandwidth, and just enough headroom to fit a 70B at INT4 with a long context — without leaving the house.

§ 01 · Specs

RTX 5090, specified.

Dense FP16 from the NVIDIA datasheet. Bandwidth is peak; sustained will be lower. Price reflects street MSRP or used-market as of the date stamped at the top.

Architectural lineage
FP16 TFLOPS over recent NVIDIA generations.
VendorNVIDIA
TierConsumer
GenerationBlackwell
VRAM32 GB · GDDR7
Memory bandwidth1,792 GB/s
FP16 dense209.5 TFLOPS
TDP575 W
Released2025
Price$1,999 MSRP
StatusAvailable
Fig 1 · Single-card spec sheet. FP16 is dense (not sparse). Bandwidth is peak HBM/GDDR.
§ 02 · Benchmarks

Eleven workloads, one card.

Throughput on the same set of repeatable workloads we use across the register. Same quantisation across cards in the same row; latency reported with p95 in the methodology notes.

Numbers without a measurement on this chip are marked "—". Cross-card comparisons live on the head-to-head pages.

CategoryWorkloadMetricRTX 5090Notes
LLM InferenceLlama 3.1 8Btok/s140tokens per second · single-stream · FP16
LLM InferenceLlama 3.1 70B · 4-bittok/s38tokens per second · single-stream · INT4 GPTQ
LLM InferenceQwen 2.5 32B · 4-bittok/s48tokens per second · single-stream · INT4
LLM InferenceMistral 7Btok/s165tokens per second · single-stream · FP16
Image GenerationSDXL 1024×1024it/s6.5iterations per second · 30 steps · FP16
Image GenerationFlux.1 Devit/s3.4iterations per second · 28 steps · FP16
TrainingFine-tune Llama 3.1 8B LoRAsamples/s12.5samples per second · seq 2k · BF16
TrainingResNet-50 · ImageNetimg/s2,800images per second · BS=256 · BF16
Computer VisionYOLOv8x · inferenceFPS320frames per second · BS=1 · FP16
Computer VisionSAM ViT-Hmasks/s9.2masks per second · 1024×1024 · FP16
Audio/VideoWhisper Large v3× RT28multiples of real-time · CPU offload off
Fig 2 · Per-workload throughput on a single RTX 5090. Higher is better unless the metric is a price.
§ 03 · VRAM fit

What fits in 32 GB, really.

FP16 weights = 2 bytes × parameters. INT4 cuts that 4× with small quality loss. Fine-tuning needs 3–4× more memory for gradients, optimiser, activations.

ModelParamsFP16INT8INT4Fits on RTX 5090?
Llama 3.1 8B8B16 GB8 GB4 GBFP16, INT8 and INT4
Qwen 2.5 14B14B28 GB14 GB7 GBFP16, INT8 and INT4
Qwen 2.5 32B32B64 GB32 GB16 GBINT8 and INT4 only
Llama 3.1 70B70B140 GB70 GB36 GBNo
DeepSeek V3671B MoE1.3 TB671 GB336 GBNo
Llama 3.1 405B405B810 GB405 GB203 GBNo
Fig 3 · Memory budget per model at each precision against this card's 32 GB envelope.
§ 04 · Compare

RTX 5090 head-to-heads.

Side-by-side spec tables and matched-quantisation throughput numbers for the comparisons people actually search for.
/hardware/rtx-5090-vs-rtx-4090

RTX 5090 vs RTX 4090

Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.

/hardware/rtx-5090-vs-h200

RTX 5090 vs H200

The biggest consumer card vs a real datacenter accelerator. When does the 5090 actually catch up?

Read next

Three places to go from here.

Hub
Hardware register
Every accelerator on the leaderboard, with FP16 TFLOPS, VRAM, $/hr, and energy cost in one place.
Per-chip page
RTX 4090
Still the workhorse: 24 GB GDDR6X, $0.29/hr on Vast.ai spot.
Per-chip page
RTX 3090
Best sub-$1k ML card on the used market. 24 GB fits a 70B at INT4.