30 challenging math problems from the 2024 AIME competition. Tests advanced mathematical reasoning.
Accuracy is the reported evaluation metric for AIME 2024. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Edit |
|---|---|---|---|---|---|---|
| 01 | o3 | verified | 96.7 | 2026 | Source ↗ | Edit result |
| 02 | o4-mini | verified | 93.4 | 2026 | Source ↗ | Edit result |
| 03 | Gemini 2.5 Pro | verified | 92 | 2026 | Source ↗ | Edit result |
| 04 | GLM-4.5-Air | unverified | 89.4 | 2025 | Paper ↗Code ↗Source ↗ | Edit result |
| 05 | Qwen3-Coder-Next | unverified | 89.01 | 2026 | Paper ↗Code ↗ | Edit result |
| 06 | Qwen3-235B-A22B | unverified | 85.7 | 2025 | Paper ↗Code ↗ | Edit result |
| 07 | o1-preview | paper | 83.3 | 2025 | Source ↗ | Edit result |
| 08 | Claude 3.7 Sonnet | verified | 80 | 2026 | Source ↗ | Edit result |
| 09 | DeepSeek R1 | verified | 79.8 | 2026 | Source ↗ | Edit result |
| 10 | Claude 3.5 Opus | unverified | 16 | 2025 | Source ↗ | Edit result |
| 11 | claude-35-opus | paper | 16 | 2025 | Source ↗ | Edit result |
| 12 | GPT-4o | unverified | 13.4 | 2025 | Source ↗ | Edit result |