When the RTX 5090 wins.
When your worst-case workload needs more than 24 GB. 32 GB unlocks a 70B at INT4 with 32k context, comfortable LoRA on 13B models, and sustained Flux generation without OOMs.
Blackwell vs Ada. The 5090 lifts FP16 by 1.27×, VRAM from 24 → 32 GB, and bandwidth from 1,008 → 1,792 GB/s — for $400 more MSRP and 125 W more under load. The case for upgrading is mostly about the extra 8 GB.
Datasheet specs only. Throughput on real workloads follows in §02 — the gap there is often smaller than the FP16 number suggests, because most ML workloads are memory-bound.
| Spec | RTX 5090 | RTX 4090 |
|---|---|---|
| Vendor | NVIDIA | NVIDIA |
| Tier | Consumer | Consumer |
| Generation | Blackwell | Ada Lovelace |
| VRAM | 32 GB GDDR7 | 24 GB GDDR6X |
| Bandwidth | 1,792 GB/s | 1,008 GB/s |
| FP16 dense | 209.5 TFLOPS | 165.2 TFLOPS |
| TDP | 575 W | 450 W |
| Released | 2025 | 2022 |
| Status | Available | Available |
| Price | $1,999 MSRP | $1,599 MSRP |
Same model revision, same quantisation, same batch size on both cards. Where one side has no measurement we leave the cell empty rather than extrapolate.
Methodology: how we test.
| Category | Workload | Metric | RTX 5090 | RTX 4090 | Δ |
|---|---|---|---|---|---|
| LLM Inference | Llama 3.1 8B | tok/s | 140 | 95 | 0.68× |
| LLM Inference | Llama 3.1 70B · 4-bit | tok/s | 38 | 22 | 0.58× |
| LLM Inference | Qwen 2.5 32B · 4-bit | tok/s | 48 | 30 | 0.63× |
| LLM Inference | Mistral 7B | tok/s | 165 | 110 | 0.67× |
| Image Generation | SDXL 1024×1024 | it/s | 6.5 | 4.2 | 0.65× |
| Image Generation | Flux.1 Dev | it/s | 3.4 | 2.1 | 0.62× |
| Training | Fine-tune Llama 3.1 8B LoRA | samples/s | 12.5 | 7.8 | 0.62× |
| Training | ResNet-50 · ImageNet | img/s | 2,800 | 1,950 | 0.70× |
| Computer Vision | YOLOv8x · inference | FPS | 320 | 210 | 0.66× |
| Computer Vision | SAM ViT-H | masks/s | 9.2 | 5.8 | 0.63× |
| Audio/Video | Whisper Large v3 | × RT | 28 | 18 | 0.64× |
The right card is the one whose envelope covers your worst-case workload — not the one with the bigger TFLOPS number.
When your worst-case workload needs more than 24 GB. 32 GB unlocks a 70B at INT4 with 32k context, comfortable LoRA on 13B models, and sustained Flux generation without OOMs.
On price/perf. $1,599 vs $1,999 buys you 79% of the FP16 and 75% of the VRAM. If your worst-case workload fits in 24 GB, the 4090 stays the value pick — especially with $0.29/hr aggregator clouds for spillover.
Bottom line. Buy the 5090 if you need 32 GB locally; otherwise the 4090 + occasional cloud H200 hours is still the sharpest workstation strategy.
Hopper’s last word against Blackwell’s first. 4.5× the FP16, almost 50% more VRAM bandwidth.
Same FP16 ceiling (989), but H200 nearly doubles VRAM (80 → 141 GB) and 1.4× bandwidth.
The biggest consumer card vs a real datacenter accelerator. When does the 5090 actually catch up?