Best local AI model for H200 141GB.

This is where 2026 large MoE starts making sense. H200's memory is for long context and large MoE, not for being a faster 4090. A single GPU may still be tight for a giant MoE, so sharding is often better.

Recommendation Full matrix

01 / Recommendation

Run this size class.

Recommended default

Kimi K2.6 / GLM-5 / MiniMax-M2-class (quantized or sharded)

Use FP8, INT8, tensor-parallel, or MoE routing. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.

Benchmark anchor

Kimi K2.6: SWE-bench Verified 80.2 · LiveCodeBench v6 89.6 · AIME 2026 96.4 · HMMT 2026 92.7 (model card).

Evidence

Kimi K2.6 reports SWE-bench Verified 80.2 and LiveCodeBench v6 89.6; large-MoE long-context demand is exactly where H200 memory helps.

02 / Alternates

Other realistic picks.

GLM-5

MiniMax-M2-class

Modern 70B-class for cheaper single-GPU serving

03 / More GPUs

Compare another card.

RTX 3060 12GB RTX 4060 Ti 16GB RTX 5080 16GB RTX 3090 24GB RTX 4090 24GB