Commonsense Reasoning2021en
Massive Multitask Language Understanding
15,908 multiple choice questions across 57 subjects from elementary to professional level.
Current State of the Art
o1-preview
OpenAI
92.3
accuracy
Top Models Performance Comparison
Top 6 models ranked by accuracy
Best Score
92.3
Top Model
o1-preview
Models Compared
6
Score Range
10.3
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | o1-preview OpenAI | 92.3 | Dec 2025 | |
| 2 | GPT-4oAPI OpenAI | 88.7 | Dec 2025 | |
| 3 | Claude 3.5 SonnetAPI Anthropic | 88.7 | Dec 2025 | |
| 4 | DeepSeek V3Open Source DeepSeek | 88.5 | Dec 2025 | |
| 5 | Gemini 1.5 ProAPI Google | 85.9 | Dec 2025 | |
| 6 | Llama 3 70BOpen Source Meta | 82 | Dec 2025 |