Mathematical Reasoning2021en

Mathematics Aptitude Test of Heuristics

12,500 competition mathematics problems (5,000 test) from AMC, AIME, and other sources covering algebra, geometry, number theory, and more. Harder than GSM8K. Modern evaluations typically use the MATH-500 representative subset.

Metrics:accuracy
Paper / WebsiteDownload
Current State of the Art

o3-mini

OpenAI

97.9

accuracy

accuracy Progress Over Time

Showing 2 breakthroughs from Jan 2025 to Mar 2026

97.297.497.697.898.0Jan 2025Mar 2026accuracyDate

Key Milestones

Jan 2025
DeepSeek-R1

MATH-500, from official DeepSeek-R1 paper.

97.3
Mar 2026
o3-miniCurrent SOTA

MATH-500, zero-shot CoT, pass@1. High reasoning effort.

97.9
+0.6%
Total Improvement
0.6%
Time Span
1y 2m
Breakthroughs
2
Current SOTA
97.9

Top Models Performance Comparison

Top 10 models ranked by accuracy

accuracy1o3-mini97.9100.0%2o397.899.9%3o4-mini97.599.6%4DeepSeek-R197.399.4%5o196.498.5%6Claude 3.7 Sonnet96.298.3%7DeepSeek V390.292.1%8o1-mini90.091.9%9GPT-4.5 Preview87.189.0%10o1-preview85.587.3%0%25%50%75%100%% of best
Best Score
97.9
Top Model
o3-mini
Models Compared
10
Score Range
12.4

accuracyPrimary

#ModelScorePaper / CodeDate
1
o3-miniAPI
OpenAI
97.9Mar 2026
2
o3API
OpenAI
97.8Mar 2026
3
o4-miniAPI
OpenAI
97.5Mar 2026
4
DeepSeek-R1Open Source
DeepSeek
97.3Mar 2026
5
o1API
OpenAI
96.4Mar 2026
6
Claude 3.7 SonnetAPI
Anthropic
96.2Mar 2026
7
DeepSeek V3Open Source
DeepSeek
90.2Mar 2026
8
o1-miniAPI
OpenAI
90Mar 2026
9
GPT-4.5 PreviewAPI
OpenAI
87.1Mar 2026
10
o1-preview
OpenAI
85.5Mar 2026
11
GPT-4.1API
OpenAI
82.1Mar 2026
12
GPT-4oAPI
OpenAI
76.6Mar 2026
13
Grok 2API
xAI
76.1Mar 2026
14
Llama 3.1 405BOpen Source
Meta
73.8Mar 2026
15
GPT-4 TurboAPI
OpenAI
73.4Mar 2026
16
Claude 3.5 SonnetAPI
Anthropic
71.1Mar 2026
17
GPT-4o Mini
OpenAI
70.2Mar 2026
18
Llama 3.1 70BOpen Source
Meta
68Mar 2026
19
Gemini 1.5 ProAPI
Google
67.7Mar 2026
20
Claude 3 OpusAPI
Anthropic
60.1Mar 2026

Other Mathematical Reasoning Datasets

MATH Benchmark - Mathematical Reasoning | CodeSOTA