Logical Reasoning2024en
Abstraction and Reasoning Corpus for AGI (v1)
400 evaluation tasks testing abstract visual reasoning. Created by François Chollet. Scores near human average (~85%) remained out of reach for LLMs until 2024.
Current State of the Art
o3 (high)
OpenAI
87.5
accuracy
Top Models Performance Comparison
Top 5 models ranked by accuracy
Best Score
87.5
Top Model
o3 (high)
Models Compared
5
Score Range
57.5
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | o3 (high)API OpenAI | 87.5 | Mar 2026 | |
| 2 | o3API OpenAI | 87.5 | Mar 2026 | |
| 3 | o4-miniAPI OpenAI | 79 | Mar 2026 | |
| 4 | Gemini 2.5 ProAPI Google | 56.1 | Mar 2026 | |
| 5 | Claude 3.7 SonnetAPI Anthropic | 30 | Mar 2026 |