MATH

Unknown

12,500 competition mathematics problems from AMC, AIME, and other sources. Harder than GSM8K.

Benchmark Stats

Models5
Papers5
Metrics1

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

accuracy

accuracy

Higher is better

RankModelCodeScorePaper / Source
1o1-preview

Competition mathematics. Massive improvement over GPT-4.

-94.8openai-blog
2deepseek-v3-90.2deepseek-blog
3gpt-4o-76.6openai-blog
4claude-35-sonnet-71.1anthropic-blog
5gemini-15-pro-67.7google-blog