Codesota · Models · Qwen2.5-Coder 32BAlibaba9 results · 8 benchmarks

Model card

Qwen2.5-Coder 32B.

Alibabaopen-source32B paramsDense Transformer

32B parameters. SOTA open-source code model at release (Nov 2024). Matches GPT-4o on HumanEval.

§ 02 · Benchmarks

Every benchmark Qwen2.5-Coder 32B has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	Bugs2Fix	Computer Code · Bug Detection	accuracy	76.8%	#2/6	2024-09-19	source ↗
02	CodeSearchNet	Computer Vision · Optical Character Recognition	bleu-4	23.4%	#2/7	2024-09-19	source ↗
03	CrossCodeEval	Computer Code · Code Completion	exact-match	43.7%	#2/6	2024-09-19	source ↗
04	TransCoder (GeeksForGeeks)	Computer Code · Code Translation	computational-accuracy	86.3%	#3/7	2024-09-19	source ↗
05	MBPP	Computer Code · Code Generation	pass@1	90.2%	#5/19	2024-09-19	source ↗
06	HumanEval	Computer Code · Code Generation	pass@1	92.7%	#9/42	2025-03-01	source ↗
07	HumanEval	Computer Code · Code Generation	pass@1	92.7%	#9/42	2024-09-19	source ↗
08	SWE-bench	Computer Code · Code Generation	resolve-rate	55.4%	#22/32	2025-06-01	source ↗
09	LiveCodeBench	Computer Code · Code Generation	pass@1	47.8%	#22/30	2024-03-12	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area

Where Qwen2.5-Coder 32B actually performs.

§ 04 · Papers

3 papers with results for Qwen2.5-Coder 32B.

2024-09-19· Computer Code· 6 results
Qwen2.5-Coder Technical Report
2024-03-12· Computer Code· 1 result
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
2023-10-10· Computer Code· 1 result
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao et al.

§ 05 · Related models

Other Alibaba models scored on Codesota.

Qwen3-235B-A22B

235B (22B active) params · 9 results · 1 SOTA

7B params · 5 results

Qwen2.5-72B-Instruct

72B params · 4 results

§ 06 · Sources & freshness

Where these numbers come from.

arxiv

results

shadow-page-humaneval

result

swebench-leaderboard

result

official-leaderboard

result

9 of 9 rows marked verified. · first result 2024-03-12, latest 2025-06-01.

Qwen2.5-Coder 32B.

Every benchmark Qwen2.5-Coder 32B has a recorded score for.

Where Qwen2.5-Coder 32B actually performs.

3 papers with results for Qwen2.5-Coder 32B.

Qwen2.5-Coder Technical Report

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Other Alibaba models scored on Codesota.

Where these numbers come from.