Commonsense Reasoning2019en

CommonsenseQA

12,247 multiple choice questions requiring commonsense reasoning about everyday concepts.

Metrics:accuracy
Paper / WebsiteDownload
Current State of the Art

GPT-4o

OpenAI

85.4

accuracy

Top Models Performance Comparison

Top 3 models ranked by accuracy

accuracy1GPT-4o85.4100.0%2Claude 3.5 Sonnet83.297.4%3Llama 3 70B80.994.7%0%25%50%75%100%% of best
Best Score
85.4
Top Model
GPT-4o
Models Compared
3
Score Range
4.5

accuracyPrimary

#ModelScorePaper / CodeDate
1
GPT-4oAPI
OpenAI
85.4Dec 2025
2
Claude 3.5 SonnetAPI
Anthropic
83.2Dec 2025
3
Llama 3 70BOpen Source
Meta
80.9Dec 2025

Other Commonsense Reasoning Datasets

CommonsenseQA Benchmark - Commonsense Reasoning | CodeSOTA