Model fit guide / H200 141GBBenchmark-first pickUpdated June 3, 2026
141 GB VRAM - Datacenter 141GB

Best local AI model for H200 141GB.

This is where 2026 large MoE starts making sense. H200's memory is for long context and large MoE, not for being a faster 4090. A single GPU may still be tight for a giant MoE, so sharding is often better.

01 / Recommendation

Run this size class.

Recommended default

Kimi K2.6 / GLM-5 / MiniMax-M2-class (quantized or sharded)

Use FP8, INT8, tensor-parallel, or MoE routing. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.

Benchmark anchor

Kimi K2.6: SWE-bench Verified 80.2 · LiveCodeBench v6 89.6 · AIME 2026 96.4 · HMMT 2026 92.7 (model card).

Evidence

Kimi K2.6 reports SWE-bench Verified 80.2 and LiveCodeBench v6 89.6; large-MoE long-context demand is exactly where H200 memory helps.

02 / Alternates

Other realistic picks.

GLM-5

MiniMax-M2-class

Modern 70B-class for cheaper single-GPU serving

03 / More GPUs

Compare another card.

RTX 3060 12GBRTX 4060 Ti 16GBRTX 5080 16GBRTX 3090 24GBRTX 4090 24GB