Code GenerationAgentic AI|8 min read

MiniMax M2.1: The New SWE-bench Leader at 90% Lower Cost

A 229B parameter Mixture-of-Experts model achieves 74.0% on SWE-bench Verified while running only 10B active parameters per token. MiniMax has delivered what may be the most cost-efficient frontier coding model to date.

74.0%
SWE-bench Verified
229B
Total Parameters
10B
Active per Token
$0.30
per 1M Tokens

On December 23, 2025, MiniMax released M2.1, a Mixture-of-Experts language model that immediately claimed the top position on SWE-bench Verified. The model achieves 74.0% on the benchmark, surpassing DeepSeek V3.2 (73.1%), Kimi K2 (71.3%), and Claude Sonnet 4.5 (68.2%). What makes this result significant is not just the raw score, but the efficiency: MiniMax M2.1 processes tokens using only 10B active parameters from its 229B total, enabling a 90% cost reduction compared to Claude.

MiniMax positions M2.1 as a "Digital Employee" optimized for agentic coding and tool use. The model includes Interleaved Thinking capability for long-horizon planning, making it well-suited for complex software engineering tasks that require sustained reasoning across multiple files and repositories.

Technical Specifications

ArchitectureMixture-of-Experts (MoE)
Total Parameters229 billion
Active Parameters10 billion per token
Release DateDecember 23, 2025
Special CapabilitiesInterleaved Thinking for long-horizon planning
Tensor FormatsFP8, BF16, F32
Deployment FrameworksSGLang, vLLM, Transformers
API Cost$0.30 per 1M tokens

Benchmark Results: MiniMax M2.1 vs Claude Sonnet 4.5

BenchmarkMiniMax M2.1Claude Sonnet 4.5Delta
SWE-bench Verified
Real GitHub issue resolution
74.0%68.2%+5.8%
VIBE (Full-stack)
Full-stack application development
88.6%86.1%+2.5%
Multi-SWE-Bench
Multi-repository engineering tasks
49.4%45.7%+3.7%

MiniMax M2.1 outperforms Claude Sonnet 4.5 across all three agentic coding benchmarks while costing 90% less per token.

Cost Analysis

API Pricing Comparison

MiniMax M2.1$0.30/1M tokens
Claude Sonnet 4.5$3.00/1M tokens
Cost Reduction90%

Why MoE Enables Low Cost

The Mixture-of-Experts architecture activates only a subset of parameters per token. With 229B total parameters but only 10B active, MiniMax M2.1 achieves frontier performance while consuming a fraction of the compute. This architectural efficiency directly translates to lower API costs.

Deployment Example

MiniMax M2.1 supports deployment via SGLang, vLLM, and Transformers. Below is an example using the Transformers library with FP8 quantization for efficient inference:

deploy_minimax.py
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load MiniMax M2.1 with FP8 quantization
model_name = "minimax/MiniMax-M2.1-229B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float8_e4m3fn,  # FP8 for efficiency
    device_map="auto",
    trust_remote_code=True,
)

# Example: Code generation prompt
prompt = """You are a software engineer. Fix the following bug:

File: utils/parser.py
Issue: JSON parsing fails on nested arrays with null values.

Provide the corrected code."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate with Interleaved Thinking enabled
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.1,
    do_sample=True,
    use_interleaved_thinking=True,  # Long-horizon planning
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

For production deployments, SGLang or vLLM are recommended for higher throughput. MiniMax also provides an official API endpoint.

Competitive Landscape

MiniMax M2.1 enters a crowded field of frontier coding models. Here is how it compares to other leading options on SWE-bench Verified:

ModelSWE-benchParametersCostLicense
SOTAMiniMax M2.1
74.0%229B (10B active)$0.30/1MPermissive
DeepSeek V3.2
73.1%671B MoE$0.27/1MMIT
Kimi K2
71.3%1T MoE$0.60/1MCommercial
Claude Sonnet 4.5
68.2%Unknown$3.00/1MCommercial
GPT-4o
33.2%Unknown$2.50/1MCommercial

Key differentiator: MiniMax M2.1's combination of top benchmark performance and low cost makes it the clear choice for cost-sensitive agentic applications. DeepSeek V3.2 offers similar pricing but slightly lower performance. Claude Sonnet 4.5 remains competitive for users who prioritize Anthropic's safety research and enterprise support.

Recommendations

When to Use MiniMax M2.1

  • -High-volume agentic coding tasks where cost is critical
  • -Multi-file repository navigation and bug fixing
  • -Full-stack application development workflows
  • -Self-hosted deployments with SGLang or vLLM
  • -Multilingual software development projects

When to Consider Alternatives

  • -Enterprise environments requiring vendor support (Claude)
  • -Mathematical reasoning tasks (consider GLM-4.7)
  • -Strict safety/alignment requirements (Claude)
  • -OpenAI ecosystem integrations (GPT-4o/o1)

Conclusion

MiniMax M2.1 represents a significant milestone in the democratization of frontier AI capabilities. By achieving the top position on SWE-bench Verified at 90% lower cost than Claude, it demonstrates that the Mixture-of-Experts architecture can deliver exceptional performance without proportional compute costs.

For teams building agentic coding systems, automated code review pipelines, or developer productivity tools, MiniMax M2.1 offers the best cost-performance ratio currently available. The support for SGLang, vLLM, and Transformers provides deployment flexibility, while the Interleaved Thinking capability enables complex multi-step reasoning tasks.

As the competitive landscape for coding models intensifies, we expect continued rapid improvement. Track the latest SWE-bench results and model comparisons on CodeSOTA.

Related Resources