Recommended default
Qwen3.6-35B-A3B high-throughput, or a modern 70B-class model
Use FP8, INT8, tensor-parallel, or MoE routing. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.
Benchmark anchor
Qwen3.6 NVFP4 vs BF16 shows low degradation (MMLU-Pro 85.0 vs 85.6), good for Hopper/Blackwell-style quantized serving.
Evidence
Qwen3.6 NVFP4 numbers show low degradation vs BF16, which fits Hopper/Blackwell quantized serving logic better than legacy 70B baselines.