The MMLU-Pro dataset contains 12K complex questions across various disciplines, including biology, business, chemistry, computer science, economics, engineering, math, physics, and psychology. It has 10 options per question, compared to the original MMLU's 4, making it more challenging. It also integrates more reasoning-focused problems, where Chain-of-Thought (CoT) results can be significantly higher than Perplexity (PPL).
Accuracy is the reported evaluation metric for MMLU-Pro. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | MiniMax M2.1 | vendor | 88 | N/A | Code ↗Source ↗ | Looks wrong? |
| 02 | Intern S2 Preview | vendor | 88 | N/A | Code ↗ | Looks wrong? |
| 03 | Qwen3.5 397B A17B | vendor | 87.8 | N/A | Code ↗Source ↗ | Looks wrong? |
| 04 | DeepSeek V4 Pro | vendor | 87.5 | N/A | Code ↗ | Looks wrong? |
| 05 | Kimi K2.5 | vendor | 87.1 | N/A | Code ↗ | Looks wrong? |
| 06 | NVIDIA Nemotron 3 Ultra 550B A55B BF16 | vendor | 86.8 | N/A | Code ↗ | Looks wrong? |
| 07 | NVIDIA Nemotron 3 Ultra 550B A55B NVFP4 | vendor | 86.8 | N/A | Code ↗ | Looks wrong? |
| 08 | Qwen3.5 122B A10B | vendor | 86.7 | N/A | Code ↗Source ↗ | Looks wrong? |
| 09 | DeepSeek V4 Flash | vendor | 86.4 | N/A | Code ↗ | Looks wrong? |
| 10 | Qwen3.6 27B | vendor | 86.2 | N/A | Code ↗ | Looks wrong? |
| 11 | Qwen3.5 27B | vendor | 86.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 12 | GLM 5 | vendor | 86 | N/A | Code ↗Source ↗ | Looks wrong? |
| 13 | Qwen3.6 35B A3B | vendor | 85.2 | N/A | Code ↗ | Looks wrong? |
| 14 | DeepSeek R1 0528 | vendor | 85 | N/A | Code ↗ | Looks wrong? |
| 15 | GLM 4.5 | vendor | 84.6 | N/A | Code ↗Source ↗ | Looks wrong? |
| 16 | Qwen3 235B A22B Thinking 2507 | vendor | 84.5 | N/A | Code ↗Source ↗ | Looks wrong? |
| 17 | Step 3.5 Flash | vendor | 84.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 18 | DeepSeek R1 | vendor | 84 | N/A | Code ↗Source ↗ | Looks wrong? |
| 19 | K EXAONE 236B A23B | vendor | 83.8 | N/A | Code ↗ | Looks wrong? |
| 20 | NVIDIA Nemotron 3 Super 120B A12B BF16 | vendor | 83.73 | N/A | Code ↗ | Looks wrong? |
| 21 | Intern S1 | vendor | 83.5 | N/A | Code ↗Source ↗ | Looks wrong? |
| 22 | EXAONE 4.5 33B | vendor | 83.3 | N/A | Code ↗ | Looks wrong? |
| 23 | Qwen3 235B A22B Instruct 2507 | vendor | 83 | N/A | Code ↗Source ↗ | Looks wrong? |
| 24 | Seed OSS 36B Instruct | vendor | 82.7 | N/A | Code ↗Source ↗ | Looks wrong? |
| 25 | LongCat Flash Chat | vendor | 82.7 | N/A | Code ↗Source ↗ | Looks wrong? |
| 26 | MiniMax M2 | vendor | 82 | N/A | Code ↗Source ↗ | Looks wrong? |
| 27 | GLM 4.5 Air | vendor | 81.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 28 | DeepSeek V3 0324 | vendor | 81.3 | N/A | Code ↗Source ↗ | Looks wrong? |
| 29 | MiniMax M1 40k | vendor | 81.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 30 | JoyAI LLM Flash | vendor | 81.02 | N/A | Code ↗ | Looks wrong? |
| 31 | Kimi K2 Instruct | vendor | 81 | N/A | Code ↗Source ↗ | Looks wrong? |
| 32 | Qwen3 30B A3B Thinking 2507 | vendor | 80.9 | N/A | Code ↗Source ↗ | Looks wrong? |
| 33 | gpt oss 120b | vendor | 80.8 | N/A | Code ↗Source ↗ | Looks wrong? |
| 34 | MiniMax M2.5 | vendor | 80.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 35 | ERNIE 4.5 300B A47B PT | vendor | 78.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 36 | NVIDIA Nemotron 3 Nano 30B A3B BF16 | vendor | 78.3 | N/A | Code ↗Source ↗ | Looks wrong? |
| 37 | LongCat Flash Lite | vendor | 78.29 | N/A | Code ↗ | Looks wrong? |
| 38 | DeepSeek V3 | vendor | 75.87 | N/A | Code ↗Source ↗ | Looks wrong? |
| 39 | MiniMax Text 01 | vendor | 75.7 | N/A | Code ↗Source ↗ | Looks wrong? |
| 40 | gpt oss 20b | vendor | 73.6 | N/A | Code ↗Source ↗ | Looks wrong? |
| 41 | GPT-4o | paper | 72.6 | 2024 | Paper ↗ | Looks wrong? |
| 42 | Qwen2.5 72B | vendor | 71.59 | N/A | Code ↗Source ↗ | Looks wrong? |
| 43 | phi 4 | vendor | 70.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 44 | Qwen3 4B Instruct 2507 | vendor | 69.6 | N/A | Code ↗ | Looks wrong? |
| 45 | ERNIE 4.5 300B A47B Base PT | vendor | 69.5 | N/A | Code ↗Source ↗ | Looks wrong? |
| 46 | Qwen2.5 32B | vendor | 69.23 | N/A | Code ↗Source ↗ | Looks wrong? |
| 47 | Gemini 1.5 Pro | paper | 69 | 2024 | Paper ↗ | Looks wrong? |
| 48 | MiMo V2.5 Pro | vendor | 68.5 | N/A | Code ↗ | Looks wrong? |
| 49 | Claude 3 Opus | paper | 68.5 | 2024 | Paper ↗ | Looks wrong? |
| 50 | Qwen3 235B A22B | vendor | 68.18 | N/A | Code ↗Source ↗ | Looks wrong? |
| 51 | Mistral Large Instruct 2411 | vendor | 67.94 | N/A | Code ↗Source ↗ | Looks wrong? |
| 52 | Hunyuan A13B Instruct | vendor | 67.3 | N/A | Code ↗Source ↗ | Looks wrong? |
| 53 | Mistral Large Instruct 2407 | vendor | 65.91 | N/A | Code ↗Source ↗ | Looks wrong? |
| 54 | DeepSeek V2.5 | vendor | 65.83 | N/A | Code ↗Source ↗ | Looks wrong? |
| 55 | Seed OSS 36B Base | vendor | 65.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 56 | NVIDIA Nemotron 3 Nano 30B A3B Base BF16 | vendor | 65.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 57 | granite 4.1 30b | vendor | 64.09 | N/A | Code ↗ | Looks wrong? |
| 58 | GPT-4-Turbo | paper | 63.7 | 2024 | Paper ↗ | Looks wrong? |
| 59 | Qwen2.5 14B | vendor | 63.69 | N/A | Code ↗Source ↗ | Looks wrong? |
| 60 | Qwen3 30B A3B Base | vendor | 61.7 | N/A | Code ↗Source ↗ | Looks wrong? |
| 61 | Llama 3.1 405B | vendor | 61.6 | N/A | Code ↗Source ↗ | Looks wrong? |
| 62 | Nemotron H 56B Base 8K | vendor | 60.5 | N/A | Code ↗Source ↗ | Looks wrong? |
| 63 | Seed OSS 36B Base woSyn | vendor | 60.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 64 | Tencent Hunyuan Large | vendor | 60.2 | N/A | Code ↗Source ↗ | Looks wrong? |
| 65 | Mellum2 12B A2.5B Base Pretrain | vendor | 59.31 | N/A | Code ↗ | Looks wrong? |
| 66 | Mellum2 12B A2.5B Base | vendor | 59.31 | N/A | Code ↗ | Looks wrong? |
| 67 | Gemini 1.5 Flash | paper | 59.1 | 2024 | Paper ↗ | Looks wrong? |
| 68 | EXAONE 3.5 32B Instruct | vendor | 58.91 | N/A | Code ↗Source ↗ | Looks wrong? |
| 69 | MiMo 7B RL | vendor | 58.6 | N/A | Code ↗Source ↗ | Looks wrong? |
| 70 | internlm3 8b instruct | vendor | 57.6 | N/A | Code ↗Source ↗ | Looks wrong? |
| 71 | ERNIE 4.5 21B A3B Base PT | vendor | 56.7 | N/A | Code ↗Source ↗ | Looks wrong? |
| 72 | Llama 3 70B Instruct | paper | 56.2 | 2024 | Paper ↗ | Looks wrong? |
| 73 | granite 4.1 8b | vendor | 55.99 | N/A | Code ↗ | Looks wrong? |
| 74 | Phi 3 medium 4k instruct | vendor | 55.7 | N/A | Code ↗Source ↗ | Looks wrong? |
| 75 | DeepSeek V2 Chat | vendor | 54.81 | N/A | Code ↗Source ↗ | Looks wrong? |
| 76 | Mistral Small 24B Base 2501 | vendor | 54.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 77 | Phi 4 mini instruct | vendor | 52.8 | N/A | Code ↗Source ↗ | Looks wrong? |
| 78 | Meta Llama 3 70B | vendor | 52.78 | N/A | Code ↗Source ↗ | Looks wrong? |
| 79 | Llama 3.1 70B | vendor | 52.47 | N/A | Code ↗Source ↗ | Looks wrong? |
| 80 | Yi 1.5 34B Chat | vendor | 52.29 | N/A | Code ↗Source ↗ | Looks wrong? |
| 81 | Phi 3 medium 128k instruct | vendor | 51.91 | N/A | Code ↗Source ↗ | Looks wrong? |
| 82 | MAmmoTH2 8x7B Plus | vendor | 50.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 83 | Qwen1.5 110B | vendor | 49.93 | N/A | Code ↗Source ↗ | Looks wrong? |
| 84 | granite 4.1 3b | vendor | 49.83 | N/A | Code ↗ | Looks wrong? |
| 85 | AI21 Jamba Large 1.5 | vendor | 49.46 | N/A | Code ↗Source ↗ | Looks wrong? |
| 86 | Mistral Small Instruct 2409 | vendor | 48.4 | N/A | Code ↗Source ↗ | Looks wrong? |
| 87 | glm 4 9b | vendor | 47.92 | N/A | Code ↗Source ↗ | Looks wrong? |
| 88 | Phi 3.5 mini instruct | vendor | 47.87 | N/A | Code ↗Source ↗ | Looks wrong? |
| 89 | EXAONE 3.5 7.8B Instruct | vendor | 46.24 | N/A | Code ↗Source ↗ | Looks wrong? |
| 90 | Yi 1.5 9B Chat | vendor | 45.95 | N/A | Code ↗Source ↗ | Looks wrong? |
| 91 | Phi 3 mini 4k instruct | vendor | 45.66 | N/A | Code ↗Source ↗ | Looks wrong? |
| 92 | aya expanse 32b | vendor | 45.41 | N/A | Code ↗Source ↗ | Looks wrong? |
| 93 | gemma 2 9b | vendor | 45.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 94 | Qwen2.5 7B | vendor | 45 | N/A | Code ↗Source ↗ | Looks wrong? |
| 95 | Phi 3 mini 128k instruct | vendor | 43.86 | N/A | Code ↗Source ↗ | Looks wrong? |
| 96 | Qwen2.5 3B | vendor | 43.73 | N/A | Code ↗Source ↗ | Looks wrong? |
| 97 | MAmmoTH2 8B Plus | vendor | 43.35 | N/A | Code ↗Source ↗ | Looks wrong? |
| 98 | Yi 34B | vendor | 43.03 | N/A | Code ↗Source ↗ | Looks wrong? |
| 99 | Mathstral 7B v0.1 | vendor | 42 | N/A | Code ↗Source ↗ | Looks wrong? |
| 100 | MiMo 7B Base | vendor | 41.9 | N/A | Code ↗Source ↗ | Looks wrong? |
| 101 | DeepSeek Coder V2 Lite Instruct | vendor | 41.57 | N/A | Code ↗Source ↗ | Looks wrong? |
| 102 | Mixtral 8x7B v0.1 | vendor | 41.03 | N/A | Code ↗Source ↗ | Looks wrong? |
| 103 | Meta Llama 3 8B Instruct | vendor | 40.98 | N/A | Code ↗Source ↗ | Looks wrong? |
| 104 | MAmmoTH2 7B Plus | vendor | 40.85 | N/A | Code ↗Source ↗ | Looks wrong? |
| 105 | Qwen2 7B | vendor | 40.73 | N/A | Code ↗Source ↗ | Looks wrong? |
| 106 | Mistral Nemo Base 2407 | vendor | 39.77 | N/A | Code ↗Source ↗ | Looks wrong? |
| 107 | EXAONE 3.5 2.4B Instruct | vendor | 39.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 108 | Yi 1.5 6B Chat | vendor | 38.23 | N/A | Code ↗Source ↗ | Looks wrong? |
| 109 | Qwen1.5 14B Chat | vendor | 38.02 | N/A | Code ↗Source ↗ | Looks wrong? |
| 110 | Ministral 8B Instruct 2410 | vendor | 37.93 | N/A | Code ↗Source ↗ | Looks wrong? |
| 111 | c4ai command r v01 | vendor | 37.9 | N/A | Code ↗Source ↗ | Looks wrong? |
| 112 | internlm2 math plus 20b | vendor | 37.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 113 | LLaDA 8B Instruct | vendor | 37 | N/A | Code ↗Source ↗ | Looks wrong? |
| 114 | Llama 3 Smaug 8B | vendor | 36.93 | N/A | Code ↗Source ↗ | Looks wrong? |
| 115 | Llama 3.1 8B | vendor | 36.6 | N/A | Code ↗Source ↗ | Looks wrong? |
| 116 | Meta Llama 3 8B | vendor | 35.36 | N/A | Code ↗Source ↗ | Looks wrong? |
| 117 | deepseek math 7b instruct | vendor | 35.3 | N/A | Code ↗Source ↗ | Looks wrong? |
| 118 | DeepSeek Coder V2 Lite Base | vendor | 34.37 | N/A | Code ↗Source ↗ | Looks wrong? |
| 119 | aya expanse 8b | vendor | 33.74 | N/A | Code ↗Source ↗ | Looks wrong? |
| 120 | gemma 7b | vendor | 33.73 | N/A | Code ↗Source ↗ | Looks wrong? |
| 121 | internlm2 math plus 7b | vendor | 33.5 | N/A | Code ↗Source ↗ | Looks wrong? |
| 122 | granite 3.1 8b base | vendor | 33.08 | N/A | Code ↗Source ↗ | Looks wrong? |
| 123 | Qwen2.5 1.5B | vendor | 32.1 | N/A | Code ↗Source ↗ | Looks wrong? |
| 124 | granite 3.0 8b base | vendor | 31.03 | N/A | Code ↗Source ↗ | Looks wrong? |
| 125 | Mistral 7B Instruct v0.2 | vendor | 30.84 | N/A | Code ↗Source ↗ | Looks wrong? |
| 126 | Mistral 7B v0.2 | vendor | 30.43 | N/A | Code ↗Source ↗ | Looks wrong? |
| 127 | Qwen1.5 7B Chat | vendor | 29.06 | N/A | Code ↗Source ↗ | Looks wrong? |
| 128 | Yi 6B Chat | vendor | 28.84 | N/A | Code ↗Source ↗ | Looks wrong? |
| 129 | Yi 6B | vendor | 26.51 | N/A | Code ↗Source ↗ | Looks wrong? |
| 130 | granite 3.1 2b base | vendor | 23.89 | N/A | Code ↗Source ↗ | Looks wrong? |
| 131 | llemma 7b | vendor | 23.45 | N/A | Code ↗Source ↗ | Looks wrong? |
| 132 | Qwen2 1.5B Instruct | vendor | 22.62 | N/A | Code ↗Source ↗ | Looks wrong? |
| 133 | Qwen2 1.5B | vendor | 22.56 | N/A | Code ↗Source ↗ | Looks wrong? |
| 134 | Llama 3.2 3B | vendor | 22.17 | N/A | Code ↗Source ↗ | Looks wrong? |
| 135 | granite 3.0 2b base | vendor | 21.72 | N/A | Code ↗Source ↗ | Looks wrong? |
| 136 | granite 3.1 3b a800m base | vendor | 20.39 | N/A | Code ↗Source ↗ | Looks wrong? |
| 137 | SmolLM2 1.7B | vendor | 18.31 | N/A | Code ↗Source ↗ | Looks wrong? |
| 138 | gemma 2b | vendor | 15.85 | N/A | Code ↗Source ↗ | Looks wrong? |
| 139 | Qwen2 0.5B | vendor | 14.97 | N/A | Code ↗Source ↗ | Looks wrong? |
| 140 | Qwen2.5 0.5B | vendor | 14.92 | N/A | Code ↗Source ↗ | Looks wrong? |
| 141 | granite 3.1 1b a400m base | vendor | 12.34 | N/A | Code ↗Source ↗ | Looks wrong? |
| 142 | Llama 3.2 1B | vendor | 11.95 | N/A | Code ↗Source ↗ | Looks wrong? |
| 143 | SmolLM 1.7B | vendor | 11.93 | N/A | Code ↗Source ↗ | Looks wrong? |
| 144 | SmolLM2 360M | vendor | 11.38 | N/A | Code ↗Source ↗ | Looks wrong? |
| 145 | SmolLM 135M | vendor | 11.22 | N/A | Code ↗Source ↗ | Looks wrong? |
| 146 | SmolLM 360M | vendor | 10.95 | N/A | Code ↗Source ↗ | Looks wrong? |
| 147 | SmolLM2 135M | vendor | 10.85 | N/A | Code ↗Source ↗ | Looks wrong? |
| 148 | Qwen2.5 VL 72B Instruct | vendor | 0.65 | N/A | Code ↗Source ↗ | Looks wrong? |