Best local AI model for H100 80GB.

H100 is throughput and serving hardware. Choose by benchmark plus serving stack, not "largest old model." Qwen3.6-35B-A3B serves cheaply at high throughput; a modern benchmark-selected 70B-class model is the alternative when you need it.

Recommendation Full matrix

01 / Recommendation

Run this size class.

Recommended default

Qwen3.6-35B-A3B high-throughput, or a modern 70B-class model

Use FP8, INT8, tensor-parallel, or MoE routing. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.

Benchmark anchor

Qwen3.6 NVFP4 vs BF16 shows low degradation (MMLU-Pro 85.0 vs 85.6), good for Hopper/Blackwell-style quantized serving.

Evidence

Qwen3.6 NVFP4 numbers show low degradation vs BF16, which fits Hopper/Blackwell quantized serving logic better than legacy 70B baselines.

02 / Alternates

Other realistic picks.

Modern 70B-class (DeepSeek/Qwen)

Qwen3.6-35B-A3B FP8 fleet

Kimi/GLM/MiniMax rows if memory allows

03 / More GPUs

Compare another card.

RTX 3060 12GB RTX 4060 Ti 16GB RTX 5080 16GB RTX 3090 24GB RTX 4090 24GB