MME-VideoOCR

NTU

1,464 videos with 2,000 QA pairs across 25 tasks. Tests OCR capabilities in video content.

Benchmark Stats

Models6
Papers6
Metrics1

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

Total Accuracy

Overall accuracy across all video OCR tasks

Higher is better

RankModelCodeScorePaper / Source
1gemini-25-pro

1,464 videos, 2,000 QA pairs, 25 tasks

-73.7%AlphaXiv
2qwen25-vl-72bHF69%AlphaXiv
3internvl3-78b-67.2%AlphaXiv
4gpt-4o-66.4%AlphaXiv
5gemini-15-pro-64.9%AlphaXiv
6qwen25-vl-32bHF61%AlphaXiv