math

Unknown

OCR benchmark

20
Total Results
20
Models Tested
1
Metrics
2026-03-06
Last Updated

accuracy

Higher is better

RankModelScoreSource
1o3-mini

MATH-500, zero-shot CoT, pass@1. High reasoning effort.

97.9openai-simple-evals
2o3

MATH-500, zero-shot CoT, pass@1. Default reasoning effort.

97.8openai-simple-evals
3o4-mini

MATH-500, zero-shot CoT, pass@1. Default reasoning effort.

97.5openai-simple-evals
4deepseek-r1

MATH-500, from official DeepSeek-R1 paper. On par with OpenAI o1.

97.3deepseek-paper
5o1

MATH-500, zero-shot CoT, pass@1.

96.4openai-simple-evals
6claude-37-sonnet

MATH-500 with extended thinking enabled.

96.2anthropic-blog
7deepseek-v3

Non-reasoning base model.

90.2deepseek-blog
8o1-mini

MATH-500, zero-shot CoT, pass@1.

90openai-simple-evals
9gpt-45-preview

Full MATH test set, zero-shot CoT.

87.1openai-simple-evals
10o1-preview

MATH-500, zero-shot CoT, pass@1.

85.5openai-simple-evals
11gpt-41

Full MATH test set, zero-shot CoT.

82.1openai-simple-evals
12gpt-4o

Full MATH test set, zero-shot CoT. gpt-4o-2024-05-13.

76.6openai-simple-evals
13grok-2

Full MATH test set.

76.1openai-simple-evals
14llama-31-405b

Full MATH test set.

73.8openai-simple-evals
15gpt-4-turbo

Full MATH test set, zero-shot CoT.

73.4openai-simple-evals
16claude-35-sonnet

Full MATH test set. Original Claude 3.5 Sonnet (June 2024).

71.1openai-simple-evals
17gpt-4o-mini

Full MATH test set, zero-shot CoT.

70.2openai-simple-evals
18llama-31-70b

Full MATH test set.

68openai-simple-evals
19gemini-15-pro

From Google's official evaluation.

67.7google-blog
20claude-3-opus

Full MATH test set.

60.1openai-simple-evals

Explore More OCR Content