Hardware Benchmarks

GPU Performance
for ML Workloads

Real-world benchmarks comparing RTX 3090, 4090, and 5090 for LLM inference, image generation, and model training.

5090 vs 3090

~3x
Faster LLM Inference
+33%
More VRAM (32GB)
1.9x
Memory Bandwidth

Specifications

Spec RTX 3090RTX 4090RTX 5090
Architecture AmpereAda LovelaceBlackwell
VRAM 24 GB 24 GB 32 GB
Memory Bandwidth 936 GB/s1008 GB/s1792 GB/s
FP16 Performance 35.6 TFLOPS82.6 TFLOPS105 TFLOPS
TDP 350W450W575W
MSRP $1499$1599$1999
Released 202020222025

ML Benchmarks

LLM Inference

Benchmark RTX 3090 RTX 4090 RTX 5090 vs 3090
Llama 3 8B (tokens/sec) 45 95 140 3.1x
Llama 3 70B 4-bit (tokens/sec) 8 22 38 4.8x
Mistral 7B (tokens/sec) 52 110 165 3.2x

Image Generation

Benchmark RTX 3090 RTX 4090 RTX 5090 vs 3090
SDXL 1024x1024 (it/s) 1.8 4.2 6.5 3.6x
Flux.1 Dev (it/s) 0.9 2.1 3.4 3.8x
SD 1.5 512x512 (it/s) 12 28 42 3.5x

Training

Benchmark RTX 3090 RTX 4090 RTX 5090 vs 3090
Fine-tune Llama 3 8B LoRA (samples/sec) 3.2 7.8 12.5 3.9x
YOLO11x training (images/sec) 45 105 160 3.6x
ResNet-50 ImageNet (images/sec) 850 1950 2800 3.3x

Computer Vision

Benchmark RTX 3090 RTX 4090 RTX 5090 vs 3090
YOLOv8x inference (FPS) 95 210 320 3.4x
SAM ViT-H (masks/sec) 2.5 5.8 9.2 3.7x

Video

Benchmark RTX 3090 RTX 4090 RTX 5090 vs 3090
Whisper Large v3 (x realtime) 8 18 28 3.5x

VRAM Requirements

Approximate VRAM needed to run popular models at full precision or with quantization.

Fits on 24GB (3090/4090)

  • Llama 3 8B (FP16) ~16 GB
  • Llama 3 70B (4-bit) ~20 GB
  • SDXL + LoRA training ~18 GB
  • Mistral 7B (FP16) ~14 GB
  • Flux.1 Dev ~22 GB

Needs 32GB (5090)

  • Llama 3 70B (8-bit) ~28 GB
  • Mixtral 8x7B (FP16) ~26 GB
  • SDXL full fine-tune ~28 GB
  • Large batch training ~30 GB
  • Video generation (Mochi) ~26 GB

Which GPU Should You Get?

RTX 3090

Best Value

Still capable for most tasks. Great for hobbyists and those on a budget. Available used at significant discounts.

  • + Best price/performance used
  • + 24GB handles most models
  • - Older architecture
  • - Power hungry (350W)

RTX 4090

Best Overall

The current sweet spot. 2x faster than 3090 with better efficiency. Widely available and proven reliable.

  • + Excellent performance
  • + Better power efficiency
  • + Mature ecosystem
  • - Still 24GB VRAM limit

RTX 5090

Best Performance

The new king. 32GB VRAM opens up larger models. Best for professionals and serious researchers.

  • + 32GB VRAM (finally!)
  • + ~50% faster than 4090
  • + Latest architecture
  • - Higher price ($1999)
  • - High power (575W)

Cloud GPU Pricing

Approximate hourly rates from popular providers. Prices vary by region and availability.

Provider RTX 3090 RTX 4090 RTX 5090
RunPod $0.22/hr $0.39/hr ~$0.69/hr
vast.ai (spot) $0.15/hr $0.30/hr TBD
Lambda Labs - $0.50/hr TBD

* Prices as of December 2024. RTX 5090 pricing estimated based on launch pricing.

Benchmark Methodology

Benchmarks are collected from community sources, manufacturer data, and independent testing. Results may vary based on driver versions, cooling, and system configuration. RTX 5090 numbers are based on early benchmarks and official specs - expect updates as more data becomes available.