Codesota · Benchmark · MMLU-ProHome/Leaderboards/MMLU-Pro
Unknown

MMLU-Pro.

The MMLU-Pro dataset contains 12K complex questions across various disciplines, including biology, business, chemistry, computer science, economics, engineering, math, physics, and psychology. It has 10 options per question, compared to the original MMLU's 4, making it more challenging. It also integrates more reasoning-focused problems, where Chain-of-Thought (CoT) results can be significantly higher than Perplexity (PPL).

Paper Leaderboard Lineage
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for MMLU-Pro. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01MiniMax M2.1
MiniMaxAI/MiniMax-M2.1
vendor88N/ACode ↗Source ↗Looks wrong?
02Intern S2 Preview
internlm/Intern-S2-Preview
vendor88N/ACode ↗Looks wrong?
03Qwen3.5 397B A17B
Qwen/Qwen3.5-397B-A17B
vendor87.8N/ACode ↗Source ↗Looks wrong?
04DeepSeek V4 Pro
deepseek-ai/DeepSeek-V4-Pro
vendor87.5N/ACode ↗Looks wrong?
05Kimi K2.5
moonshotai/Kimi-K2.5
vendor87.1N/ACode ↗Looks wrong?
06NVIDIA Nemotron 3 Ultra 550B A55B BF16
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
vendor86.8N/ACode ↗Looks wrong?
07NVIDIA Nemotron 3 Ultra 550B A55B NVFP4
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
vendor86.8N/ACode ↗Looks wrong?
08Qwen3.5 122B A10B
Qwen/Qwen3.5-122B-A10B
vendor86.7N/ACode ↗Source ↗Looks wrong?
09DeepSeek V4 Flash
deepseek-ai/DeepSeek-V4-Flash
vendor86.4N/ACode ↗Looks wrong?
10Qwen3.6 27B
Qwen/Qwen3.6-27B
vendor86.2N/ACode ↗Looks wrong?
11Qwen3.5 27B
Qwen/Qwen3.5-27B
vendor86.1N/ACode ↗Source ↗Looks wrong?
12GLM 5
zai-org/GLM-5
vendor86N/ACode ↗Source ↗Looks wrong?
13Qwen3.6 35B A3B
Qwen/Qwen3.6-35B-A3B
vendor85.2N/ACode ↗Looks wrong?
14DeepSeek R1 0528
deepseek-ai/DeepSeek-R1-0528
vendor85N/ACode ↗Looks wrong?
15GLM 4.5
zai-org/GLM-4.5
vendor84.6N/ACode ↗Source ↗Looks wrong?
16Qwen3 235B A22B Thinking 2507
Qwen/Qwen3-235B-A22B-Thinking-2507
vendor84.5N/ACode ↗Source ↗Looks wrong?
17Step 3.5 Flash
stepfun-ai/Step-3.5-Flash
vendor84.4N/ACode ↗Source ↗Looks wrong?
18DeepSeek R1
deepseek-ai/DeepSeek-R1
vendor84N/ACode ↗Source ↗Looks wrong?
19K EXAONE 236B A23B
LGAI-EXAONE/K-EXAONE-236B-A23B
vendor83.8N/ACode ↗Looks wrong?
20NVIDIA Nemotron 3 Super 120B A12B BF16
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
vendor83.73N/ACode ↗Looks wrong?
21Intern S1
internlm/Intern-S1
vendor83.5N/ACode ↗Source ↗Looks wrong?
22EXAONE 4.5 33B
LGAI-EXAONE/EXAONE-4.5-33B
vendor83.3N/ACode ↗Looks wrong?
23Qwen3 235B A22B Instruct 2507
Qwen/Qwen3-235B-A22B-Instruct-2507
vendor83N/ACode ↗Source ↗Looks wrong?
24Seed OSS 36B Instruct
ByteDance-Seed/Seed-OSS-36B-Instruct
vendor82.7N/ACode ↗Source ↗Looks wrong?
25LongCat Flash Chat
meituan-longcat/LongCat-Flash-Chat
vendor82.7N/ACode ↗Source ↗Looks wrong?
26MiniMax M2
MiniMaxAI/MiniMax-M2
vendor82N/ACode ↗Source ↗Looks wrong?
27GLM 4.5 Air
zai-org/GLM-4.5-Air
vendor81.4N/ACode ↗Source ↗Looks wrong?
28DeepSeek V3 0324
deepseek-ai/DeepSeek-V3-0324
vendor81.3N/ACode ↗Source ↗Looks wrong?
29MiniMax M1 40k
MiniMaxAI/MiniMax-M1-40k
vendor81.1N/ACode ↗Source ↗Looks wrong?
30JoyAI LLM Flash
jdopensource/JoyAI-LLM-Flash
vendor81.02N/ACode ↗Looks wrong?
31Kimi K2 Instruct
moonshotai/Kimi-K2-Instruct
vendor81N/ACode ↗Source ↗Looks wrong?
32Qwen3 30B A3B Thinking 2507
Qwen/Qwen3-30B-A3B-Thinking-2507
vendor80.9N/ACode ↗Source ↗Looks wrong?
33gpt oss 120b
openai/gpt-oss-120b
vendor80.8N/ACode ↗Source ↗Looks wrong?
34MiniMax M2.5
MiniMaxAI/MiniMax-M2.5
vendor80.1N/ACode ↗Source ↗Looks wrong?
35ERNIE 4.5 300B A47B PT
baidu/ERNIE-4.5-300B-A47B-PT
vendor78.4N/ACode ↗Source ↗Looks wrong?
36NVIDIA Nemotron 3 Nano 30B A3B BF16
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
vendor78.3N/ACode ↗Source ↗Looks wrong?
37LongCat Flash Lite
meituan-longcat/LongCat-Flash-Lite
vendor78.29N/ACode ↗Looks wrong?
38DeepSeek V3
deepseek-ai/DeepSeek-V3
vendor75.87N/ACode ↗Source ↗Looks wrong?
39MiniMax Text 01
MiniMaxAI/MiniMax-Text-01
vendor75.7N/ACode ↗Source ↗Looks wrong?
40gpt oss 20b
openai/gpt-oss-20b
vendor73.6N/ACode ↗Source ↗Looks wrong?
41GPT-4o
Original MMLU-Pro paper, 5-shot CoT
paper72.62024Paper ↗Looks wrong?
42Qwen2.5 72B
Qwen/Qwen2.5-72B
vendor71.59N/ACode ↗Source ↗Looks wrong?
43phi 4
microsoft/phi-4
vendor70.4N/ACode ↗Source ↗Looks wrong?
44Qwen3 4B Instruct 2507
Qwen/Qwen3-4B-Instruct-2507
vendor69.6N/ACode ↗Looks wrong?
45ERNIE 4.5 300B A47B Base PT
baidu/ERNIE-4.5-300B-A47B-Base-PT
vendor69.5N/ACode ↗Source ↗Looks wrong?
46Qwen2.5 32B
Qwen/Qwen2.5-32B
vendor69.23N/ACode ↗Source ↗Looks wrong?
47Gemini 1.5 Pro
Original MMLU-Pro paper, 5-shot CoT
paper692024Paper ↗Looks wrong?
48MiMo V2.5 Pro
XiaomiMiMo/MiMo-V2.5-Pro
vendor68.5N/ACode ↗Looks wrong?
49Claude 3 Opus
Original MMLU-Pro paper, 5-shot CoT
paper68.52024Paper ↗Looks wrong?
50Qwen3 235B A22B
Qwen/Qwen3-235B-A22B
vendor68.18N/ACode ↗Source ↗Looks wrong?
51Mistral Large Instruct 2411
mistralai/Mistral-Large-Instruct-2411
vendor67.94N/ACode ↗Source ↗Looks wrong?
52Hunyuan A13B Instruct
tencent/Hunyuan-A13B-Instruct
vendor67.3N/ACode ↗Source ↗Looks wrong?
53Mistral Large Instruct 2407
mistralai/Mistral-Large-Instruct-2407
vendor65.91N/ACode ↗Source ↗Looks wrong?
54DeepSeek V2.5
deepseek-ai/DeepSeek-V2.5
vendor65.83N/ACode ↗Source ↗Looks wrong?
55Seed OSS 36B Base
ByteDance-Seed/Seed-OSS-36B-Base
vendor65.1N/ACode ↗Source ↗Looks wrong?
56NVIDIA Nemotron 3 Nano 30B A3B Base BF16
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
vendor65.1N/ACode ↗Source ↗Looks wrong?
57granite 4.1 30b
ibm-granite/granite-4.1-30b
vendor64.09N/ACode ↗Looks wrong?
58GPT-4-Turbo
Original MMLU-Pro paper, 5-shot CoT
paper63.72024Paper ↗Looks wrong?
59Qwen2.5 14B
Qwen/Qwen2.5-14B
vendor63.69N/ACode ↗Source ↗Looks wrong?
60Qwen3 30B A3B Base
Qwen/Qwen3-30B-A3B-Base
vendor61.7N/ACode ↗Source ↗Looks wrong?
61Llama 3.1 405B
meta-llama/Llama-3.1-405B
vendor61.6N/ACode ↗Source ↗Looks wrong?
62Nemotron H 56B Base 8K
nvidia/Nemotron-H-56B-Base-8K
vendor60.5N/ACode ↗Source ↗Looks wrong?
63Seed OSS 36B Base woSyn
ByteDance-Seed/Seed-OSS-36B-Base-woSyn
vendor60.4N/ACode ↗Source ↗Looks wrong?
64Tencent Hunyuan Large
tencent/Tencent-Hunyuan-Large
vendor60.2N/ACode ↗Source ↗Looks wrong?
65Mellum2 12B A2.5B Base Pretrain
JetBrains/Mellum2-12B-A2.5B-Base-Pretrain
vendor59.31N/ACode ↗Looks wrong?
66Mellum2 12B A2.5B Base
JetBrains/Mellum2-12B-A2.5B-Base
vendor59.31N/ACode ↗Looks wrong?
67Gemini 1.5 Flash
Original MMLU-Pro paper, 5-shot CoT
paper59.12024Paper ↗Looks wrong?
68EXAONE 3.5 32B Instruct
LGAI-EXAONE/EXAONE-3.5-32B-Instruct
vendor58.91N/ACode ↗Source ↗Looks wrong?
69MiMo 7B RL
XiaomiMiMo/MiMo-7B-RL
vendor58.6N/ACode ↗Source ↗Looks wrong?
70internlm3 8b instruct
internlm/internlm3-8b-instruct
vendor57.6N/ACode ↗Source ↗Looks wrong?
71ERNIE 4.5 21B A3B Base PT
baidu/ERNIE-4.5-21B-A3B-Base-PT
vendor56.7N/ACode ↗Source ↗Looks wrong?
72Llama 3 70B Instruct
Original MMLU-Pro paper, 5-shot CoT
paper56.22024Paper ↗Looks wrong?
73granite 4.1 8b
ibm-granite/granite-4.1-8b
vendor55.99N/ACode ↗Looks wrong?
74Phi 3 medium 4k instruct
microsoft/Phi-3-medium-4k-instruct
vendor55.7N/ACode ↗Source ↗Looks wrong?
75DeepSeek V2 Chat
deepseek-ai/DeepSeek-V2-Chat
vendor54.81N/ACode ↗Source ↗Looks wrong?
76Mistral Small 24B Base 2501
mistralai/Mistral-Small-24B-Base-2501
vendor54.4N/ACode ↗Source ↗Looks wrong?
77Phi 4 mini instruct
microsoft/Phi-4-mini-instruct
vendor52.8N/ACode ↗Source ↗Looks wrong?
78Meta Llama 3 70B
meta-llama/Meta-Llama-3-70B
vendor52.78N/ACode ↗Source ↗Looks wrong?
79Llama 3.1 70B
meta-llama/Llama-3.1-70B
vendor52.47N/ACode ↗Source ↗Looks wrong?
80Yi 1.5 34B Chat
01-ai/Yi-1.5-34B-Chat
vendor52.29N/ACode ↗Source ↗Looks wrong?
81Phi 3 medium 128k instruct
microsoft/Phi-3-medium-128k-instruct
vendor51.91N/ACode ↗Source ↗Looks wrong?
82MAmmoTH2 8x7B Plus
TIGER-Lab/MAmmoTH2-8x7B-Plus
vendor50.4N/ACode ↗Source ↗Looks wrong?
83Qwen1.5 110B
Qwen/Qwen1.5-110B
vendor49.93N/ACode ↗Source ↗Looks wrong?
84granite 4.1 3b
ibm-granite/granite-4.1-3b
vendor49.83N/ACode ↗Looks wrong?
85AI21 Jamba Large 1.5
ai21labs/AI21-Jamba-Large-1.5
vendor49.46N/ACode ↗Source ↗Looks wrong?
86Mistral Small Instruct 2409
mistralai/Mistral-Small-Instruct-2409
vendor48.4N/ACode ↗Source ↗Looks wrong?
87glm 4 9b
zai-org/glm-4-9b
vendor47.92N/ACode ↗Source ↗Looks wrong?
88Phi 3.5 mini instruct
microsoft/Phi-3.5-mini-instruct
vendor47.87N/ACode ↗Source ↗Looks wrong?
89EXAONE 3.5 7.8B Instruct
LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
vendor46.24N/ACode ↗Source ↗Looks wrong?
90Yi 1.5 9B Chat
01-ai/Yi-1.5-9B-Chat
vendor45.95N/ACode ↗Source ↗Looks wrong?
91Phi 3 mini 4k instruct
microsoft/Phi-3-mini-4k-instruct
vendor45.66N/ACode ↗Source ↗Looks wrong?
92aya expanse 32b
CohereLabs/aya-expanse-32b
vendor45.41N/ACode ↗Source ↗Looks wrong?
93gemma 2 9b
google/gemma-2-9b
vendor45.1N/ACode ↗Source ↗Looks wrong?
94Qwen2.5 7B
Qwen/Qwen2.5-7B
vendor45N/ACode ↗Source ↗Looks wrong?
95Phi 3 mini 128k instruct
microsoft/Phi-3-mini-128k-instruct
vendor43.86N/ACode ↗Source ↗Looks wrong?
96Qwen2.5 3B
Qwen/Qwen2.5-3B
vendor43.73N/ACode ↗Source ↗Looks wrong?
97MAmmoTH2 8B Plus
TIGER-Lab/MAmmoTH2-8B-Plus
vendor43.35N/ACode ↗Source ↗Looks wrong?
98Yi 34B
01-ai/Yi-34B
vendor43.03N/ACode ↗Source ↗Looks wrong?
99Mathstral 7B v0.1
mistralai/Mathstral-7B-v0.1
vendor42N/ACode ↗Source ↗Looks wrong?
100MiMo 7B Base
XiaomiMiMo/MiMo-7B-Base
vendor41.9N/ACode ↗Source ↗Looks wrong?
101DeepSeek Coder V2 Lite Instruct
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
vendor41.57N/ACode ↗Source ↗Looks wrong?
102Mixtral 8x7B v0.1
mistralai/Mixtral-8x7B-v0.1
vendor41.03N/ACode ↗Source ↗Looks wrong?
103Meta Llama 3 8B Instruct
meta-llama/Meta-Llama-3-8B-Instruct
vendor40.98N/ACode ↗Source ↗Looks wrong?
104MAmmoTH2 7B Plus
TIGER-Lab/MAmmoTH2-7B-Plus
vendor40.85N/ACode ↗Source ↗Looks wrong?
105Qwen2 7B
Qwen/Qwen2-7B
vendor40.73N/ACode ↗Source ↗Looks wrong?
106Mistral Nemo Base 2407
mistralai/Mistral-Nemo-Base-2407
vendor39.77N/ACode ↗Source ↗Looks wrong?
107EXAONE 3.5 2.4B Instruct
LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
vendor39.1N/ACode ↗Source ↗Looks wrong?
108Yi 1.5 6B Chat
01-ai/Yi-1.5-6B-Chat
vendor38.23N/ACode ↗Source ↗Looks wrong?
109Qwen1.5 14B Chat
Qwen/Qwen1.5-14B-Chat
vendor38.02N/ACode ↗Source ↗Looks wrong?
110Ministral 8B Instruct 2410
mistralai/Ministral-8B-Instruct-2410
vendor37.93N/ACode ↗Source ↗Looks wrong?
111c4ai command r v01
CohereLabs/c4ai-command-r-v01
vendor37.9N/ACode ↗Source ↗Looks wrong?
112internlm2 math plus 20b
internlm/internlm2-math-plus-20b
vendor37.1N/ACode ↗Source ↗Looks wrong?
113LLaDA 8B Instruct
GSAI-ML/LLaDA-8B-Instruct
vendor37N/ACode ↗Source ↗Looks wrong?
114Llama 3 Smaug 8B
abacusai/Llama-3-Smaug-8B
vendor36.93N/ACode ↗Source ↗Looks wrong?
115Llama 3.1 8B
meta-llama/Llama-3.1-8B
vendor36.6N/ACode ↗Source ↗Looks wrong?
116Meta Llama 3 8B
meta-llama/Meta-Llama-3-8B
vendor35.36N/ACode ↗Source ↗Looks wrong?
117deepseek math 7b instruct
deepseek-ai/deepseek-math-7b-instruct
vendor35.3N/ACode ↗Source ↗Looks wrong?
118DeepSeek Coder V2 Lite Base
deepseek-ai/DeepSeek-Coder-V2-Lite-Base
vendor34.37N/ACode ↗Source ↗Looks wrong?
119aya expanse 8b
CohereLabs/aya-expanse-8b
vendor33.74N/ACode ↗Source ↗Looks wrong?
120gemma 7b
google/gemma-7b
vendor33.73N/ACode ↗Source ↗Looks wrong?
121internlm2 math plus 7b
internlm/internlm2-math-plus-7b
vendor33.5N/ACode ↗Source ↗Looks wrong?
122granite 3.1 8b base
ibm-granite/granite-3.1-8b-base
vendor33.08N/ACode ↗Source ↗Looks wrong?
123Qwen2.5 1.5B
Qwen/Qwen2.5-1.5B
vendor32.1N/ACode ↗Source ↗Looks wrong?
124granite 3.0 8b base
ibm-granite/granite-3.0-8b-base
vendor31.03N/ACode ↗Source ↗Looks wrong?
125Mistral 7B Instruct v0.2
mistralai/Mistral-7B-Instruct-v0.2
vendor30.84N/ACode ↗Source ↗Looks wrong?
126Mistral 7B v0.2
mistral-community/Mistral-7B-v0.2
vendor30.43N/ACode ↗Source ↗Looks wrong?
127Qwen1.5 7B Chat
Qwen/Qwen1.5-7B-Chat
vendor29.06N/ACode ↗Source ↗Looks wrong?
128Yi 6B Chat
01-ai/Yi-6B-Chat
vendor28.84N/ACode ↗Source ↗Looks wrong?
129Yi 6B
01-ai/Yi-6B
vendor26.51N/ACode ↗Source ↗Looks wrong?
130granite 3.1 2b base
ibm-granite/granite-3.1-2b-base
vendor23.89N/ACode ↗Source ↗Looks wrong?
131llemma 7b
EleutherAI/llemma_7b
vendor23.45N/ACode ↗Source ↗Looks wrong?
132Qwen2 1.5B Instruct
Qwen/Qwen2-1.5B-Instruct
vendor22.62N/ACode ↗Source ↗Looks wrong?
133Qwen2 1.5B
Qwen/Qwen2-1.5B
vendor22.56N/ACode ↗Source ↗Looks wrong?
134Llama 3.2 3B
meta-llama/Llama-3.2-3B
vendor22.17N/ACode ↗Source ↗Looks wrong?
135granite 3.0 2b base
ibm-granite/granite-3.0-2b-base
vendor21.72N/ACode ↗Source ↗Looks wrong?
136granite 3.1 3b a800m base
ibm-granite/granite-3.1-3b-a800m-base
vendor20.39N/ACode ↗Source ↗Looks wrong?
137SmolLM2 1.7B
HuggingFaceTB/SmolLM2-1.7B
vendor18.31N/ACode ↗Source ↗Looks wrong?
138gemma 2b
google/gemma-2b
vendor15.85N/ACode ↗Source ↗Looks wrong?
139Qwen2 0.5B
Qwen/Qwen2-0.5B
vendor14.97N/ACode ↗Source ↗Looks wrong?
140Qwen2.5 0.5B
Qwen/Qwen2.5-0.5B
vendor14.92N/ACode ↗Source ↗Looks wrong?
141granite 3.1 1b a400m base
ibm-granite/granite-3.1-1b-a400m-base
vendor12.34N/ACode ↗Source ↗Looks wrong?
142Llama 3.2 1B
meta-llama/Llama-3.2-1B
vendor11.95N/ACode ↗Source ↗Looks wrong?
143SmolLM 1.7B
HuggingFaceTB/SmolLM-1.7B
vendor11.93N/ACode ↗Source ↗Looks wrong?
144SmolLM2 360M
HuggingFaceTB/SmolLM2-360M
vendor11.38N/ACode ↗Source ↗Looks wrong?
145SmolLM 135M
HuggingFaceTB/SmolLM-135M
vendor11.22N/ACode ↗Source ↗Looks wrong?
146SmolLM 360M
HuggingFaceTB/SmolLM-360M
vendor10.95N/ACode ↗Source ↗Looks wrong?
147SmolLM2 135M
HuggingFaceTB/SmolLM2-135M
vendor10.85N/ACode ↗Source ↗Looks wrong?
148Qwen2.5 VL 72B Instruct
Qwen/Qwen2.5-VL-72B-Instruct
vendor0.65N/ACode ↗Source ↗Looks wrong?
Lineage

MMLU-Pro in context.

See full reasoning benchmarks lineage →
This benchmark (1)
active2024-06
MMLU-Pro
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards