The MMLU-Pro dataset contains 12K complex questions across various disciplines, including biology, business, chemistry, computer science, economics, engineering, math, physics, and psychology. It has 10 options per question, compared to the original MMLU's 4, making it more challenging. It also integrates more reasoning-focused problems, where Chain-of-Thought (CoT) results can be significantly higher than Perplexity (PPL).
Accuracy is the reported evaluation metric for MMLU-Pro. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Source |
|---|---|---|---|---|---|
| 01 | Qwen2.5-Plus | paper | 72.5 | N/A | Source ↗ |