Codesota · Benchmark · MBPPHome/Leaderboards/Code & Software Engineering/Code Generation/MBPP
Unknown

MBPP.

974 crowd-sourced Python programming problems suitable for beginners. Covers programming fundamentals and standard library.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

pass@1

Pass@1 is the reported evaluation metric for MBPP. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for pass@1verifiedpapervendorcommunityunverified
RankModelTrustScoreYearSource
01o4-mini
OpenAI model card. MBPP pass@1.
verified94.92026Source ↗
02o3-mini
OpenAI o3-mini model card. MBPP pass@1.
verified93.32026Source ↗
03Claude Opus 4
Anthropic model card. MBPP pass@1.
verified922026Source ↗
04Claude 3.5 Sonnet (Oct 2024)
Qwen2.5-Coder tech report Table 16
verified912024Source ↗
05GPT-4.1
OpenAI GPT-4.1 model card. MBPP pass@1.
verified90.92026Source ↗
06Qwen2.5-Coder 32B
Table 2, arxiv:2409.12186. MBPP pass@1.
verified90.22024Source ↗
07Qwen2.5-Coder-32B-Instruct
Qwen2.5-Coder tech report Table 16
verified90.22024Source ↗
08Claude Sonnet 4
Anthropic model card. MBPP pass@1.
verified89.62026Source ↗
09DeepSeek-Coder-V2-Instruct
Qwen2.5-Coder tech report Table 16
verified89.42024Source ↗
10DeepSeek-V3
DeepSeek-V3 tech report. MBPP pass@1.
verified89.32026Source ↗
11Claude 3.5 Sonnetunverified89.22025Source ↗
12claude-35-sonnetpaper89.22025Source ↗
13GPT-4ounverified87.82025Source ↗
14GPT-4o (Aug 2024)
Qwen2.5-Coder tech report Table 16
verified86.82024Source ↗
15Qwen2.5-Coder-7B-Instruct
Qwen2.5-Coder tech report Table 16
verified83.52024Source ↗
16Codestral 22B v0.1
Qwen2.5-Coder tech report Table 16
verified78.22024Source ↗
17Llama 4 Maverick (17B-128E)
Meta Llama 4 Maverick model card
verified77.62025Source ↗
18Llama-4-Maverick
Meta Llama 4 Maverick model card
verified77.62025Source ↗
19Codestral 22B
Mistral official blog, May 2024. MBPP pass@1.
verified75.42024Source ↗
20Gemma-3-27b
Gemma 3 tech report
verified74.42025Source ↗
21Gemma 3 27B IT
Gemma 3 tech report
verified74.42025Source ↗
22Gemma 3 12B IT
Gemma 3 tech report
verified732025Source ↗
23Llama 4 Scout (17B-16E)
Meta Llama 4 Scout model card, pre-trained
verified67.82025Source ↗
24Llama-4-Scout
Meta Llama 4 Scout model card, pre-trained
verified67.82025Source ↗
25Gemma 3 4B IT
Gemma 3 tech report
verified63.22025Source ↗
26Code Llama 34B
Code Llama paper, arxiv:2308.12950. MBPP pass@1.
verified62.62026Source ↗
27StarCoder2 15B
Table 2, arxiv:2402.19173. StarCoder2-15B base model.
verified54.42024Source ↗
Lineage

MBPP in context.

See full coding benchmarks lineage →
This benchmark (1)
saturated2021-08
MBPP
§ 04 · Submit a result

Add to the leaderboard.

← Back to Code Generation