Code Generation2021python
Mostly Basic Python Problems
974 crowd-sourced Python programming problems suitable for beginners. Covers programming fundamentals and standard library.
Current State of the Art
o4-mini
OpenAI
94.9
pass@1
MBPP — pass@1
19 results · 5 SOTA advances · higher is better
All results
SOTA frontier
Model Size vs Score — Pareto Frontier
5 models · log scale · Pareto frontier shown
Global
Bielik
PLLuM
Pareto
pass@1 Progress Over Time
Showing 4 breakthroughs from Aug 2023 to Mar 2026
Key Milestones
Total Improvement
51.6%
Time Span
2y 8m
Breakthroughs
4
Current SOTA
94.9
Top Models Performance Comparison
Top 10 models ranked by pass@1
Best Score
94.9
Top Model
o4-mini
Models Compared
10
Score Range
5.7
pass@1Primary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | o4-miniAPI OpenAI | 94.9 | Mar 2026 | |
| 2 | o3-miniAPI OpenAI | 93.3 | Mar 2026 | |
| 3 | Claude Opus 4API Anthropic | 92 | Mar 2026 | |
| 4 | GPT-4.1API OpenAI | 90.9 | Mar 2026 | |
| 5 | Qwen2.5-Coder-32B-InstructOpen Source Alibaba | 90.2 | Sep 2024 | |
| 6 | Claude Sonnet 4API Anthropic | 89.6 | Mar 2026 | |
| 7 | DeepSeek-Coder-V2-InstructOpen Source DeepSeek | 89.4 | Sep 2024 | |
| 8 | DeepSeek-Coder-V2-InstructOpen Source DeepSeek | 89.4 | Jun 2024 | |
| 9 | DeepSeek-V3Open Source DeepSeek | 89.3 | Mar 2026 | |
| 10 | Claude 3.5 SonnetAPI Anthropic | 89.2 | Dec 2025 | |
| 11 | GPT-4oAPI OpenAI | 87.8 | Dec 2025 | |
| 12 | Llama-4-MaverickOpen Source Meta | 77.6 | Apr 2025 | |
| 13 | Codestral 22B Mistral | 75.4 | Codestral: Hello, World! | May 2024 |
| 14 | Gemma-3-27b Google | 74.4 | Mar 2025 | |
| 15 | Gemma 3 12B IT Google DeepMind | 73 | Mar 2025 | |
| 16 | Llama-4-ScoutOpen Source Meta | 67.8 | Apr 2025 | |
| 17 | Gemma 3 4B IT Google DeepMind | 63.2 | Mar 2025 | |
| 18 | Code Llama 34BOpen Source Meta | 62.6 | Mar 2026 | |
| 19 | StarCoder2-15BOpen Source BigCode | 54.4 | Feb 2024 |
Related Papers3
Qwen2.5-Coder Technical Report
Sep 2024Models: Qwen2.5-Coder-32B-Instruct
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Jun 2024Models: DeepSeek-Coder-V2-Instruct
StarCoder2 and The Stack v2: The Next Generation
Feb 2024Models: StarCoder2-15B