← Hardware · RTX 4090NVIDIA · Consumer · Ada LovelaceIssue: April 22, 2026
Consumer · Ada Lovelace · released 2022

RTX 4090. Specs, benchmarks, $/hr.

Still the workhorse. 24 GB GDDR6X, 165 FP16 TFLOPS, and the cheapest H100-adjacent card on aggregator clouds at $0.29/hr spot. The default for anyone who doesn't need a 70B in FP16.

§ 01 · Specs

RTX 4090, specified.

Dense FP16 from the NVIDIA datasheet. Bandwidth is peak; sustained will be lower. Price reflects street MSRP or used-market as of the date stamped at the top.

Architectural lineage
FP16 TFLOPS over recent NVIDIA generations.
VendorNVIDIA
TierConsumer
GenerationAda Lovelace
VRAM24 GB · GDDR6X
Memory bandwidth1,008 GB/s
FP16 dense165.2 TFLOPS
TDP450 W
Released2022
Price$1,599 MSRP
StatusAvailable
Fig 1 · Single-card spec sheet. FP16 is dense (not sparse). Bandwidth is peak HBM/GDDR.
§ 02 · Benchmarks

Eleven workloads, one card.

Throughput on the same set of repeatable workloads we use across the register. Same quantisation across cards in the same row; latency reported with p95 in the methodology notes.

Numbers without a measurement on this chip are marked "—". Cross-card comparisons live on the head-to-head pages.

CategoryWorkloadMetricRTX 4090Notes
LLM InferenceLlama 3.1 8Btok/s95tokens per second · single-stream · FP16
LLM InferenceLlama 3.1 70B · 4-bittok/s22tokens per second · single-stream · INT4 GPTQ
LLM InferenceQwen 2.5 32B · 4-bittok/s30tokens per second · single-stream · INT4
LLM InferenceMistral 7Btok/s110tokens per second · single-stream · FP16
Image GenerationSDXL 1024×1024it/s4.2iterations per second · 30 steps · FP16
Image GenerationFlux.1 Devit/s2.1iterations per second · 28 steps · FP16
TrainingFine-tune Llama 3.1 8B LoRAsamples/s7.8samples per second · seq 2k · BF16
TrainingResNet-50 · ImageNetimg/s1,950images per second · BS=256 · BF16
Computer VisionYOLOv8x · inferenceFPS210frames per second · BS=1 · FP16
Computer VisionSAM ViT-Hmasks/s5.8masks per second · 1024×1024 · FP16
Audio/VideoWhisper Large v3× RT18multiples of real-time · CPU offload off
Fig 2 · Per-workload throughput on a single RTX 4090. Higher is better unless the metric is a price.
§ 03 · VRAM fit

What fits in 24 GB, really.

FP16 weights = 2 bytes × parameters. INT4 cuts that 4× with small quality loss. Fine-tuning needs 3–4× more memory for gradients, optimiser, activations.

ModelParamsFP16INT8INT4Fits on RTX 4090?
Llama 3.1 8B8B16 GB8 GB4 GBFP16, INT8 and INT4
Qwen 2.5 14B14B28 GB14 GB7 GBINT8 and INT4 only
Qwen 2.5 32B32B64 GB32 GB16 GBINT4 only
Llama 3.1 70B70B140 GB70 GB36 GBNo
DeepSeek V3671B MoE1.3 TB671 GB336 GBNo
Llama 3.1 405B405B810 GB405 GB203 GBNo
Fig 3 · Memory budget per model at each precision against this card's 24 GB envelope.
§ 04 · Compare

RTX 4090 head-to-heads.

Side-by-side spec tables and matched-quantisation throughput numbers for the comparisons people actually search for.
/hardware/rtx-5090-vs-rtx-4090

RTX 5090 vs RTX 4090

Blackwell vs Ada. 32 GB GDDR7 against 24 GB GDDR6X, at 1.27× the FP16.

Read next

Three places to go from here.

Hub
Hardware register
Every accelerator on the leaderboard, with FP16 TFLOPS, VRAM, $/hr, and energy cost in one place.
Per-chip page
RTX 5090
First consumer card with 32 GB. The ceiling for a single-PSU workstation.
Per-chip page
RTX 3090
Best sub-$1k ML card on the used market. 24 GB fits a 70B at INT4.