Harder successor to ARC-AGI-1, released 2025. Designed to be more resistant to test-time compute scaling. Scores reported as % on public evaluation set.
Accuracy is the reported evaluation metric for ARC-AGI-2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Gemini 2.5 Pro | verified | 5.00 | 2026 | Source ↗ | Looks wrong? |
| 02 | o3 | verified | 4.00 | 2026 | Source ↗ | Looks wrong? |
| 03 | o4-mini | verified | 3.00 | 2026 | Source ↗ | Looks wrong? |