← Hardware · RTX 3090NVIDIA · Consumer · AmpereIssue: April 22, 2026
Consumer · Ampere · released 2020

RTX 3090. Specs, benchmarks, $/hr.

Five-year-old silicon, six-month-payback economics. 24 GB GDDR6X, 35.6 FP16 TFLOPS, and a used-market price that has stayed near $800 since the 4090 launched. Still the only sub-$1k card with enough VRAM for a 70B at INT4.

§ 01 · Specs

RTX 3090, specified.

Dense FP16 from the NVIDIA datasheet. Bandwidth is peak; sustained will be lower. Price reflects street MSRP or used-market as of the date stamped at the top.

Architectural lineage
FP16 TFLOPS over recent NVIDIA generations.
VendorNVIDIA
TierConsumer
GenerationAmpere
VRAM24 GB · GDDR6X
Memory bandwidth936 GB/s
FP16 dense35.6 TFLOPS
TDP350 W
Released2020
Price~$700–900 used
StatusUsed market
Fig 1 · Single-card spec sheet. FP16 is dense (not sparse). Bandwidth is peak HBM/GDDR.
§ 02 · Benchmarks

Eleven workloads, one card.

Throughput on the same set of repeatable workloads we use across the register. Same quantisation across cards in the same row; latency reported with p95 in the methodology notes.

Numbers without a measurement on this chip are marked "—". Cross-card comparisons live on the head-to-head pages.

CategoryWorkloadMetricRTX 3090Notes
LLM InferenceLlama 3.1 8Btok/s45tokens per second · single-stream · FP16
LLM InferenceLlama 3.1 70B · 4-bittok/s8tokens per second · single-stream · INT4 GPTQ
LLM InferenceQwen 2.5 32B · 4-bittok/s12tokens per second · single-stream · INT4
LLM InferenceMistral 7Btok/s52tokens per second · single-stream · FP16
Image GenerationSDXL 1024×1024it/s1.8iterations per second · 30 steps · FP16
Image GenerationFlux.1 Devit/s0.9iterations per second · 28 steps · FP16
TrainingFine-tune Llama 3.1 8B LoRAsamples/s3.2samples per second · seq 2k · BF16
TrainingResNet-50 · ImageNetimg/s850images per second · BS=256 · BF16
Computer VisionYOLOv8x · inferenceFPS95frames per second · BS=1 · FP16
Computer VisionSAM ViT-Hmasks/s2.5masks per second · 1024×1024 · FP16
Audio/VideoWhisper Large v3× RT8multiples of real-time · CPU offload off
Fig 2 · Per-workload throughput on a single RTX 3090. Higher is better unless the metric is a price.
§ 03 · VRAM fit

What fits in 24 GB, really.

FP16 weights = 2 bytes × parameters. INT4 cuts that 4× with small quality loss. Fine-tuning needs 3–4× more memory for gradients, optimiser, activations.

ModelParamsFP16INT8INT4Fits on RTX 3090?
Llama 3.1 8B8B16 GB8 GB4 GBFP16, INT8 and INT4
Qwen 2.5 14B14B28 GB14 GB7 GBINT8 and INT4 only
Qwen 2.5 32B32B64 GB32 GB16 GBINT4 only
Llama 3.1 70B70B140 GB70 GB36 GBNo
DeepSeek V3671B MoE1.3 TB671 GB336 GBNo
Llama 3.1 405B405B810 GB405 GB203 GBNo
Fig 3 · Memory budget per model at each precision against this card's 24 GB envelope.
§ 04 · Compare

RTX 3090 head-to-heads.

Side-by-side spec tables and matched-quantisation throughput numbers for the comparisons people actually search for.

No head-to-head pages for this chip yet — propose one in /contribute.

Read next

Three places to go from here.

Hub
Hardware register
Every accelerator on the leaderboard, with FP16 TFLOPS, VRAM, $/hr, and energy cost in one place.
Per-chip page
RTX 5090
First consumer card with 32 GB. The ceiling for a single-PSU workstation.
Per-chip page
RTX 4090
Still the workhorse: 24 GB GDDR6X, $0.29/hr on Vast.ai spot.