Codesota · Models · Qwen3-235B-A22BAlibaba21 results · 13 benchmarks
Model card

Qwen3-235B-A22B.

Alibabaopen-weights235B (22B active) paramsmoe1 current SOTA

Qwen3 flagship MoE model, May 2025

§ 02 · Benchmarks

Every benchmark Qwen3-235B-A22B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01WebArenaAgentic AI · Web & Desktop Agentsaccuracy95.6%#1/1source ↗
02MBPPComputer Code · Code Generationpass-181.4%#2/3source ↗
03MBPP+Computer Code · Code Generationpass-181.4%#2/9source ↗
04BIG-Bench HardReasoning · Multi-step Reasoningaccuracy88.9%#3/11source ↗
05AIME 2024Reasoning · Mathematical Reasoningaccuracy85.7%#6/11source ↗
06LiveCodeBench ProComputer Code · Code Generationelo1673.00#6/10source ↗
07LiveCodeBenchComputer Code · Code Generationpass@170.7%#8/30source ↗
08AIME 2025Reasoning · Mathematical Reasoningaccuracy81.5%#15/22source ↗
09LiveCodeBenchComputer Code · Code Generationpass-170.7%#15/24source ↗
10MMLUReasoning · Commonsense Reasoningaccuracy87.8%#27/64source ↗
11GSM8KReasoning · Mathematical Reasoningaccuracy94.4%#29/48source ↗
12MATHReasoning · Mathematical Reasoningaccuracy71.8%#34/46source ↗
13GPQA DiamondReasoning · Multi-step Reasoningaccuracy71.1%#47/74source ↗
14GPQA DiamondReasoning · Multi-step Reasoningaccuracy71.1%#47/74source ↗
15PLCCNatural Language Processing · Polish Cultural Competencygrammar66.0%#61/165source ↗
16PLCCNatural Language Processing · Polish Cultural Competencygeography69.0%#93/165source ↗
17PLCCNatural Language Processing · Polish Cultural Competencyhistory70.0%#94/165source ↗
18PLCCNatural Language Processing · Polish Cultural Competencyaverage55.0%#103/165source ↗
19PLCCNatural Language Processing · Polish Cultural Competencyvocabulary43.0%#110/165source ↗
20PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition45.0%#118/165source ↗
21PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment37.0%#124/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Qwen3-235B-A22B actually performs.

Agentic AI
1
benchmark
avg rank #1.0 · 1 SOTA
Computer Code
4
benchmarks
avg rank #6.6
Reasoning
7
benchmarks
avg rank #26.0
Natural Language Processing
1
benchmark
avg rank #100.4
§ 04 · Papers

1 paper with results for Qwen3-235B-A22B.

  1. 2025-05-14· 11 results

    Qwen3 Technical Report

§ 05 · Related models

Other Alibaba models scored on Codesota.

Qwen2-VL 72B
9 results
Qwen3.5-397B-A17B
8 results
Qwen3.5-122B-A10B
6 results
Qwen3.5-27B
6 results
Qwen3.5-35B-A3B
6 results
Qwen2-VL 7B
7B params · 5 results
Qwen2.5-72B-Instruct
72B params · 4 results
Qwen2.5-Coder 32B
32B params · 4 results
§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
11
results
sdadas/PLCC
7
results
livecodebench-pro-official
1
result
arxiv-2505.09388
1
result
qwen-model-card
1
result
9 of 21 rows marked verified.