Best local AI model for A100 80GB.

Do not default to an old Llama 3.1 70B. For many tasks a newer 35B-A3B beats an older 70B in usefulness. 80GB makes a modern 70B/72B possible, but only pick it if it wins your specific target benchmark.

Recommendation Full matrix

01 / Recommendation

Run this size class.

Recommended default

Qwen3.6-35B-A3B serving, or a modern 70B/72B only if it wins your evals

Use FP8, INT8, or high-quality 4-bit. This is the highest-scoring current open-weight model that fits this card cleanly, selected by benchmark then fit then freshness, not by parameter count.

Benchmark anchor

Qwen3.6-35B-A3B: MMLU-Pro 85.6/85.0, GPQA Diamond 84.9/84.8. AA Intelligence Index currently leads with Kimi K2.6, MiMo-V2.5-Pro, DeepSeek V4 Pro — generation beats parameter count.

Evidence

Artificial Analysis ranks top open-weight models by freshness (Kimi K2.6, MiMo-V2.5-Pro, DeepSeek V4 Pro), illustrating that generation beats old parameter-count heuristics.

02 / Alternates

Other realistic picks.

Qwen3.6-35B-A3B high-throughput

Modern 70B/72B (benchmark-selected)

DeepSeek V4-class rows for cheaper throughput

03 / More GPUs

Compare another card.

RTX 3060 12GB RTX 4060 Ti 16GB RTX 5080 16GB RTX 3090 24GB RTX 4090 24GB