SVAMP

Unknown

1,000 elementary-level math word problems testing robustness of arithmetic reasoning.

Benchmark Stats

Models3
Papers3
Metrics1

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

accuracy

accuracy

Higher is better

RankModelCodeScorePaper / Source
1gpt-4o

Simple Variations on Arithmetic Math word Problems.

-93.7arXiv Paper
2claude-35-sonnet-91.2arXiv Paper
3llama-3-70bHF89.5meta-blog