Recommended default
Qwen3.6-35B-A3B serving, or a modern 70B/72B only if it wins your evals
Use FP8, INT8, or high-quality 4-bit. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.
Benchmark anchor
Qwen3.6-35B-A3B: MMLU-Pro 85.6/85.0, GPQA Diamond 84.9/84.8. AA Intelligence Index currently leads with Kimi K2.6, MiMo-V2.5-Pro, DeepSeek V4 Pro — generation beats parameter count.
Evidence
Artificial Analysis ranks top open-weight models by freshness (Kimi K2.6, MiMo-V2.5-Pro, DeepSeek V4 Pro), illustrating that generation beats old parameter-count heuristics.