Arithmetic Reasoning2016en

Math Word Problem Repository

3,320 arithmetic word problems from various sources, testing basic arithmetic reasoning.

Metrics:accuracy
Paper / Website
Current State of the Art

GPT-4o

OpenAI

97.2

accuracy

Top Models Performance Comparison

Top 3 models ranked by accuracy

accuracy1GPT-4o97.2100.0%2Claude 3.5 Sonnet95.898.6%3Llama 3 70B94.196.8%0%25%50%75%100%% of best
Best Score
97.2
Top Model
GPT-4o
Models Compared
3
Score Range
3.1

accuracyPrimary

#ModelScorePaper / CodeDate
1
GPT-4oAPI
OpenAI
97.2Dec 2025
2
Claude 3.5 SonnetAPI
Anthropic
95.8Dec 2025
3
Llama 3 70BOpen Source
Meta
94.1Dec 2025

Other Arithmetic Reasoning Datasets

MAWPS Benchmark - Arithmetic Reasoning | CodeSOTA