Recommended default
Kimi K2.6 / GLM-5 / MiniMax-M2-class (quantized or sharded)
Use FP8, INT8, tensor-parallel, or MoE routing. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.
Benchmark anchor
Kimi K2.6: SWE-bench Verified 80.2 · LiveCodeBench v6 89.6 · AIME 2026 96.4 · HMMT 2026 92.7 (model card).
Evidence
Kimi K2.6 reports SWE-bench Verified 80.2 and LiveCodeBench v6 89.6; large-MoE long-context demand is exactly where H200 memory helps.