Recommended default
GLM-5 / Kimi K2.6 / MiniMax-M2/M3-class
Use FP4/FP8, tensor parallel, or provider-native quantization. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.
Benchmark anchor
GLM-5: GPQA-Diamond 86.0 · SWE-bench Verified 77.8 · SWE-bench Multilingual 73.3. Kimi K2.6: LiveCodeBench v6 89.6 · SWE-bench Verified 80.2.
Evidence
GLM-5 and Kimi K2.6 report frontier 2026 coding/reasoning scores (SWE-bench Verified 77.8 and 80.2); this is large-MoE hardware, not a 70B host.