MMLU
Unknown
15,908 multiple choice questions across 57 subjects from elementary to professional level.
Benchmark Stats
Models18
Papers18
Metrics1
SOTA History
Coming SoonVisual timeline of state-of-the-art progression over time will appear here.
accuracy
accuracy
Higher is better
| Rank | Model | Code | Score | Paper / Source |
|---|---|---|---|---|
| 1 | o3 | - | 92.9 | openai-simple-evals |
| 2 | o1 | - | 91.8 | openai-simple-evals |
| 3 | gpt-45-preview | - | 90.8 | openai-simple-evals |
| 4 | o1-preview | - | 90.8 | openai-simple-evals |
| 5 | gpt-41 | - | 90.2 | openai-simple-evals |
| 6 | o4-mini | - | 90 | openai-simple-evals |
| 7 | llama-31-405b | - | 88.6 | openai-simple-evals |
| 8 | deepseek-v3 | - | 88.5 | openai-simple-evals |
| 9 | claude-35-sonnet | - | 88.3 | openai-simple-evals |
| 10 | grok-2 | - | 87.5 | openai-simple-evals |
| 11 | gpt-4o | - | 87.2 | openai-simple-evals |
| 12 | claude-3-opus | - | 86.8 | openai-simple-evals |
| 13 | gpt-4-turbo | - | 86.7 | openai-simple-evals |
| 14 | o3-mini | - | 85.9 | openai-simple-evals |
| 15 | gemini-15-pro | - | 85.9 | openai-simple-evals |
| 16 | o1-mini | - | 85.2 | openai-simple-evals |
| 17 | gpt-4o-mini | - | 82 | openai-simple-evals |
| 18 | llama-31-70b | - | 82 | openai-simple-evals |