OCR benchmark
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | o4-mini (high) | 98.2 | |
| 2 | o3 (high) | 98.1 | |
| 3 | o3-mini | 97.9 | |
| 4 | o3 | 97.8 | |
| 5 | o4-mini | 97.5 | |
| 6 | DeepSeek-R1 | 97.3 | |
| 7 | Gemini 2.5 Pro | 97.3 | |
| 8 | o1 | 96.4 | |
| 9 | Claude 3.7 Sonnet | 96.2 | |
| 10 | Kimi k1.5 | 96.2 | |
| 11 | DeepSeek-R1-Zero | 95.9 | |
| 12 | DeepSeek-R1-Distill-Llama-70B | 94.5 | |
| 13 | DeepSeek-R1-Distill-Qwen-32B | 94.3 | |
| 14 | DeepSeek-V3-0324 | 94 | |
| 15 | QwQ-32B | 90.6 | |
| 16 | deepseek-v3 | 90.2 | |
| 17 | o1-mini | 90 | |
| 18 | GPT-4.5 Preview | 87.1 | |
| 19 | o1-preview | 85.5 | |
| 20 | GPT-4.1 | 82.1 | |
| 21 | gpt-4o | 76.6 | |
| 22 | Grok 2 | 76.1 | |
| 23 | Llama 3.1 405B | 73.8 | |
| 24 | GPT-4 Turbo | 73.4 | |
| 25 | claude-35-sonnet | 71.1 | |
| 26 | gpt-4o-mini | 70.2 | |
| 27 | Llama 3.1 70B | 68 | |
| 28 | gemini-15-pro | 67.7 | |
| 29 | Claude 3 Opus | 60.1 |