Logical Reasoning2024en

Abstraction and Reasoning Corpus for AGI (v1)

400 evaluation tasks testing abstract visual reasoning. Created by François Chollet. Scores near human average (~85%) remained out of reach for LLMs until 2024.

Samples:400
Metrics:accuracy
Paper / Website
Current State of the Art

o3 (high)

OpenAI

87.5

accuracy

Top Models Performance Comparison

Top 5 models ranked by accuracy

accuracy1o3 (high)87.5100.0%2o387.5100.0%3o4-mini79.090.3%4Gemini 2.5 Pro56.164.1%5Claude 3.7 Sonnet30.034.3%0%25%50%75%100%% of best
Best Score
87.5
Top Model
o3 (high)
Models Compared
5
Score Range
57.5

accuracyPrimary

#ModelScorePaper / CodeDate
1
o3 (high)API
OpenAI
87.5Mar 2026
2
o3API
OpenAI
87.5Mar 2026
3
o4-miniAPI
OpenAI
79Mar 2026
4
Gemini 2.5 ProAPI
Google
56.1Mar 2026
5
Claude 3.7 SonnetAPI
Anthropic
30Mar 2026

Other Logical Reasoning Datasets