Code GenerationAgentic AIDecember 30, 2025|8 min read

MiniMax M2.1: The New SWE-bench Leader at 90% Lower Cost

A 229B parameter Mixture-of-Experts model achieves 74.0% on SWE-bench Verified while running only 10B active parameters per token. MiniMax has delivered what may be the most cost-efficient frontier coding model to date.

74.0%

SWE-bench Verified

229B

Total Parameters

10B

Active per Token

$0.30

per 1M Tokens

On December 23, 2025, MiniMax released M2.1, a Mixture-of-Experts language model that immediately claimed the top position on SWE-bench Verified. The model achieves 74.0% on the benchmark, surpassing DeepSeek V3.2 (73.1%), Kimi K2 (71.3%), and Claude Sonnet 4.5 (68.2%). What makes this result significant is not just the raw score, but the efficiency: MiniMax M2.1 processes tokens using only 10B active parameters from its 229B total, enabling a 90% cost reduction compared to Claude.

MiniMax positions M2.1 as a "Digital Employee" optimized for agentic coding and tool use. The model includes Interleaved Thinking capability for long-horizon planning, making it well-suited for complex software engineering tasks that require sustained reasoning across multiple files and repositories.

Technical Specifications

Architecture	Mixture-of-Experts (MoE)
Total Parameters	229 billion
Active Parameters	10 billion per token
Release Date	December 23, 2025
Special Capabilities	Interleaved Thinking for long-horizon planning
Tensor Formats	FP8, BF16, F32
Deployment Frameworks	SGLang, vLLM, Transformers
API Cost	$0.30 per 1M tokens

Benchmark Results: MiniMax M2.1 vs Claude Sonnet 4.5

Benchmark	MiniMax M2.1	Claude Sonnet 4.5	Delta
SWE-bench Verified Real GitHub issue resolution	74.0%	68.2%	+5.8%
VIBE (Full-stack) Full-stack application development	88.6%	86.1%	+2.5%
Multi-SWE-Bench Multi-repository engineering tasks	49.4%	45.7%	+3.7%

MiniMax M2.1 outperforms Claude Sonnet 4.5 across all three agentic coding benchmarks while costing 90% less per token.

Cost Analysis

API Pricing Comparison

MiniMax M2.1$0.30/1M tokens

Claude Sonnet 4.5$3.00/1M tokens

Cost Reduction90%

Why MoE Enables Low Cost

The Mixture-of-Experts architecture activates only a subset of parameters per token. With 229B total parameters but only 10B active, MiniMax M2.1 achieves frontier performance while consuming a fraction of the compute. This architectural efficiency directly translates to lower API costs.

Deployment Example

MiniMax M2.1 supports deployment via SGLang, vLLM, and Transformers. Below is an example using the Transformers library with FP8 quantization for efficient inference:

deploy_minimax.py

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load MiniMax M2.1 with FP8 quantization
model_name = "minimax/MiniMax-M2.1-229B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float8_e4m3fn,  # FP8 for efficiency
    device_map="auto",
    trust_remote_code=True,
)

# Example: Code generation prompt
prompt = """You are a software engineer. Fix the following bug:

File: utils/parser.py
Issue: JSON parsing fails on nested arrays with null values.

Provide the corrected code."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate with Interleaved Thinking enabled
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.1,
    do_sample=True,
    use_interleaved_thinking=True,  # Long-horizon planning
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

For production deployments, SGLang or vLLM are recommended for higher throughput. MiniMax also provides an official API endpoint.

Competitive Landscape

MiniMax M2.1 enters a crowded field of frontier coding models. Here is how it compares to other leading options on SWE-bench Verified:

Model	SWE-bench	Parameters	Cost	License
SOTAMiniMax M2.1	74.0%	229B (10B active)	$0.30/1M	Permissive
DeepSeek V3.2	73.1%	671B MoE	$0.27/1M	MIT
Kimi K2	71.3%	1T MoE	$0.60/1M	Commercial
Claude Sonnet 4.5	68.2%	Unknown	$3.00/1M	Commercial
GPT-4o	33.2%	Unknown	$2.50/1M	Commercial

Key differentiator: MiniMax M2.1's combination of top benchmark performance and low cost makes it the clear choice for cost-sensitive agentic applications. DeepSeek V3.2 offers similar pricing but slightly lower performance. Claude Sonnet 4.5 remains competitive for users who prioritize Anthropic's safety research and enterprise support.

Recommendations

When to Use MiniMax M2.1

-High-volume agentic coding tasks where cost is critical
-Multi-file repository navigation and bug fixing
-Full-stack application development workflows
-Self-hosted deployments with SGLang or vLLM
-Multilingual software development projects

When to Consider Alternatives

-Enterprise environments requiring vendor support (Claude)
-Mathematical reasoning tasks (consider GLM-4.7)
-Strict safety/alignment requirements (Claude)
-OpenAI ecosystem integrations (GPT-4o/o1)

Conclusion

MiniMax M2.1 represents a significant milestone in the democratization of frontier AI capabilities. By achieving the top position on SWE-bench Verified at 90% lower cost than Claude, it demonstrates that the Mixture-of-Experts architecture can deliver exceptional performance without proportional compute costs.

For teams building agentic coding systems, automated code review pipelines, or developer productivity tools, MiniMax M2.1 offers the best cost-performance ratio currently available. The support for SGLang, vLLM, and Transformers provides deployment flexibility, while the Interleaved Thinking capability enables complex multi-step reasoning tasks.

As the competitive landscape for coding models intensifies, we expect continued rapid improvement. Track the latest SWE-bench results and model comparisons on CodeSOTA.