humaneval
Unknown
OCR benchmark
18
Total Results
18
Models Tested
1
Metrics
2026-03-06
Last Updated
pass@1
Higher is better
| Rank | Model | Score | Source |
|---|---|---|---|
| 1 | o4-mini | 97.3 | openai-simple-evals |
| 2 | o3-mini | 96.3 | openai-simple-evals |
| 3 | gpt-41 | 94.5 | openai-simple-evals |
| 4 | o1-preview | 92.4 | openai-simple-evals |
| 5 | o1-mini | 92.4 | openai-simple-evals |
| 6 | claude-35-sonnet | 92 | openai-simple-evals |
| 7 | gpt-4o | 91 | openai-simple-evals |
| 8 | llama-31-405b | 89 | openai-simple-evals |
| 9 | gpt-45-preview | 88.6 | openai-simple-evals |
| 10 | grok-2 | 88.4 | openai-simple-evals |
| 11 | gpt-4-turbo | 88.2 | openai-simple-evals |
| 12 | o3 | 87.4 | openai-simple-evals |
| 13 | gpt-4o-mini | 87.2 | openai-simple-evals |
| 14 | claude-3-opus | 84.9 | openai-simple-evals |
| 15 | deepseek-v3 | 82.6 | openai-simple-evals |
| 16 | llama-3-70b | 81.7 | openai-simple-evals |
| 17 | llama-31-70b | 80.5 | openai-simple-evals |
| 18 | gemini-15-pro | 71.9 | openai-simple-evals |