General OCR Capabilities2024multilingual
Comprehensive Challenge OCR
Multi-scene text reading, key information extraction, multilingual text, and document parsing benchmark.
Metrics:multi-scene-f1, kie-f1, multilingual-f1, document-parsing
Paper / WebsiteCurrent State of the Art
Gemini 1.5 Pro
83.25
multi-scene-f1
Top Models Performance Comparison
Top 5 models ranked by multi-scene-f1
Best Score
83.3
Top Model
Gemini 1.5 Pro
Models Compared
5
Score Range
10.4
document-parsing
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Gemini 1.5 ProAPI Google | 62.37 | Dec 2025 |
kie-f1
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Qwen2-VL 72BOpen Source Alibaba | 71.76 | Dec 2025 | |
| 2 | Gemini 1.5 ProAPI Google | 67.28 | Dec 2025 | |
| 3 | Claude 3.5 SonnetAPI Anthropic | 64.58 | Dec 2025 | |
| 4 | GPT-4oAPI OpenAI | 63.45 | Dec 2025 |
multi-scene-f1Primary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Gemini 1.5 ProAPI Google | 83.25 | Dec 2025 | |
| 2 | Qwen2-VL 72BOpen Source Alibaba | 77.95 | Dec 2025 | |
| 3 | InternVL2-76BOpen Source Shanghai AI Lab | 76.92 | Dec 2025 | |
| 4 | GPT-4oAPI OpenAI | 76.4 | Dec 2025 | |
| 5 | Claude 3.5 SonnetAPI Anthropic | 72.87 | Dec 2025 |
multilingual-f1
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Gemini 1.5 ProAPI Google | 78.97 | Dec 2025 | |
| 2 | GPT-4oAPI OpenAI | 73.44 | Dec 2025 |