General OCR Capabilities2024multilingual

Comprehensive Challenge OCR

Multi-scene text reading, key information extraction, multilingual text, and document parsing benchmark.

Metrics:multi-scene-f1, kie-f1, multilingual-f1, document-parsing
Paper / Website
Current State of the Art

Gemini 1.5 Pro

Google

83.25

multi-scene-f1

Top Models Performance Comparison

Top 5 models ranked by multi-scene-f1

multi-scene-f11Gemini 1.5 Pro83.3100.0%2Qwen2-VL 72B78.093.6%3InternVL2-76B76.992.4%4GPT-4o76.491.8%5Claude 3.5 Sonnet72.987.5%0%25%50%75%100%% of best
Best Score
83.3
Top Model
Gemini 1.5 Pro
Models Compared
5
Score Range
10.4

document-parsing

#ModelScorePaper / CodeDate
1
Gemini 1.5 ProAPI
Google
62.37Dec 2025

kie-f1

#ModelScorePaper / CodeDate
1
Qwen2-VL 72BOpen Source
Alibaba
71.76Dec 2025
2
Gemini 1.5 ProAPI
Google
67.28Dec 2025
3
Claude 3.5 SonnetAPI
Anthropic
64.58Dec 2025
4
GPT-4oAPI
OpenAI
63.45Dec 2025

multi-scene-f1Primary

#ModelScorePaper / CodeDate
1
Gemini 1.5 ProAPI
Google
83.25Dec 2025
2
Qwen2-VL 72BOpen Source
Alibaba
77.95Dec 2025
3
InternVL2-76BOpen Source
Shanghai AI Lab
76.92Dec 2025
4
GPT-4oAPI
OpenAI
76.4Dec 2025
5
Claude 3.5 SonnetAPI
Anthropic
72.87Dec 2025

multilingual-f1

#ModelScorePaper / CodeDate
1
Gemini 1.5 ProAPI
Google
78.97Dec 2025
2
GPT-4oAPI
OpenAI
73.44Dec 2025

Other General OCR Capabilities Datasets