Recommended default
Qwen3.6-35B-A3B (BF16/FP8/INT8 or high quant)
Use BF16, FP8, INT8, or high-quality 4-bit. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.
Benchmark anchor
MMLU-Pro 85.6 BF16 / 85.0 NVFP4 · GPQA Diamond 84.9 / 84.8 · AIME 2025 89.2 / 88.8.
Evidence
Qwen3.6-35B-A3B benchmark evidence outranks legacy 70B baselines; 40GB is comfortable for high-quality 35B serving.