KOSMOS-2.5.

Microsoftmultimodalms-research

Microsoft KOSMOS-2.5 multimodal literate model for machine reading of text-intensive images.

GitHub ↗

§ 02 · Benchmarks

Every benchmark KOSMOS-2.5 has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	CC-OCR	Computer Vision · General OCR Capabilities	multilingual-f1	36.2%	#7/8	—	source ↗
02	CC-OCR	Computer Vision · General OCR Capabilities	multi-scene-f1	47.5%	#9/9	—	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area