Mathematical Reasoning2021en

Grade School Math 8K

8,500 grade school math word problems requiring multi-step reasoning. The most popular math reasoning benchmark.

Metrics:accuracy
Paper / WebsiteDownload
Current State of the Art

o1-preview

OpenAI

97.8

accuracy

Top Models Performance Comparison

Top 5 models ranked by accuracy

accuracy1o1-preview97.8100.0%2Claude 3.5 Sonnet96.498.6%3Llama 3 70B93.095.1%4GPT-4o92.094.1%5Gemini 1.5 Pro91.793.8%0%25%50%75%100%% of best
Best Score
97.8
Top Model
o1-preview
Models Compared
5
Score Range
6.1

accuracyPrimary

#ModelScorePaper / CodeDate
1
o1-preview
OpenAI
97.8Dec 2025
2
Claude 3.5 SonnetAPI
Anthropic
96.4Dec 2025
3
Llama 3 70BOpen Source
Meta
93Dec 2025
4
GPT-4oAPI
OpenAI
92Dec 2025
5
Gemini 1.5 ProAPI
Google
91.7Dec 2025

Other Mathematical Reasoning Datasets