Best local AI model for RTX 5090 32GB.

The 5090's 32GB buys you better quant and longer context than 24GB, not a bigger model. Run Qwen3.6-35B-A3B at higher quality. Do not jump to a stale 70B just because more memory is present.

Recommendation Full matrix

01 / Recommendation

Run this size class.

Recommended default

Qwen3.6-35B-A3B (higher quant)

Use Q5-ish / FP4 where supported, 32k-64k practical. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.

Benchmark anchor

Same Qwen3.6-35B-A3B score profile (MMLU-Pro 85.6/85.0, GPQA Diamond 84.9/84.8); NVFP4 loses little vs BF16, which matters for Blackwell-era deployment.

Evidence

NVFP4 numbers show low degradation vs BF16; on 32GB the lever is quality and context, not parameter count.

02 / Alternates

Other realistic picks.

Qwen3.6-35B-A3B at FP4 for context headroom

Qwen3-32B dense at higher quant

70B only as a compromised low-quant test

03 / More GPUs

Compare another card.

RTX 3060 12GB RTX 4060 Ti 16GB RTX 5080 16GB RTX 3090 24GB RTX 4090 24GB