CC-OCR

South China University of Technology

Multi-scene text reading, key information extraction, multilingual text, and document parsing benchmark.

Benchmark Stats

Models5
Papers12
Metrics4

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

Multi-Scene F1

F1 score on multi-scene text reading

Higher is better

RankModelCodeScorePaper / Source
1gemini-15-pro

Multi-Scene Text Reading - Overall F1 score

-83.25%AlphaXiv
2qwen2-vl-72bHF77.95%AlphaXiv
3internvl2-76b-76.92%AlphaXiv
4gpt-4o-76.4%AlphaXiv
5claude-35-sonnet-72.87%AlphaXiv

KIE F1

F1 score on key information extraction

Higher is better

RankModelCodeScorePaper / Source
1qwen2-vl-72b

Key Information Extraction - Overall F1 score

HF71.76%AlphaXiv
2gemini-15-pro-67.28%AlphaXiv
3claude-35-sonnet-64.58%AlphaXiv
4gpt-4o-63.45%AlphaXiv

Multilingual F1

F1 score on multilingual text (10 languages)

Higher is better

RankModelCodeScorePaper / Source
1gemini-15-pro

Multilingual Text Reading - 10 languages

-78.97%AlphaXiv
2gpt-4o-73.44%AlphaXiv

Document Parsing

Average score on document parsing

Higher is better

RankModelCodeScorePaper / Source
1gemini-15-pro

Document Parsing - Average Score

-62.37AlphaXiv