Arithmetic Reasoning2021en

Simple Variations on Arithmetic Math Word Problems

1,000 elementary-level math word problems testing robustness of arithmetic reasoning.

Metrics:accuracy
Paper / WebsiteDownload
Current State of the Art

GPT-4o

OpenAI

93.7

accuracy

Top Models Performance Comparison

Top 3 models ranked by accuracy

accuracy1GPT-4o93.7100.0%2Claude 3.5 Sonnet91.297.3%3Llama 3 70B89.595.5%0%25%50%75%100%% of best
Best Score
93.7
Top Model
GPT-4o
Models Compared
3
Score Range
4.2

accuracyPrimary

#ModelScorePaper / CodeDate
1
GPT-4oAPI
OpenAI
93.7Dec 2025
2
Claude 3.5 SonnetAPI
Anthropic
91.2Dec 2025
3
Llama 3 70BOpen Source
Meta
89.5Dec 2025

Other Arithmetic Reasoning Datasets

SVAMP Benchmark - Arithmetic Reasoning | CodeSOTA