InternVL2-76B.

Shanghai AI Labopen-source76B paramsVision-Language ModelMIT

§ 02 · Benchmarks

Every benchmark InternVL2-76B has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	VQA v2.0	Multimodal · Visual Question Answering	accuracy	87.2%	#2/16	2024-04-25	source ↗
02	CC-OCR	Computer Vision · General OCR Capabilities	multi-scene-f1	76.9%	#3/9	—	source ↗
03	CC-OCR	Computer Vision · General OCR Capabilities	kie-f1	61.6%	#5/5	—	source ↗
04	TextVQA	Multimodal · Visual Question Answering	accuracy	84.4%	#6/23	2024-04-25	source ↗
05	CC-OCR	Computer Vision · General OCR Capabilities	document-parsing	35.3%	#6/6	—	source ↗
06	CC-OCR	Computer Vision · General OCR Capabilities	multilingual-f1	46.6%	#6/8	—	source ↗
07	MMBench	Multimodal · Visual Question Answering	accuracy	86.5%	#10/20	2024-04-25	source ↗
08	MMMU	Multimodal · Visual Question Answering	accuracy	67.4%	#17/30	2024-04-25	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area

Where InternVL2-76B actually performs.

§ 04 · Papers

1 paper with results for InternVL2-76B.

2024-04-25· Multimodal· 4 results
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

§ 05 · Related models

Other Shanghai AI Lab models scored on Codesota.

78B params · 2 results

InternImage-H

Unknown params · 1 result

§ 06 · Sources & freshness

Where these numbers come from.

arxiv

results

cc-ocr-paper

results

alphaxiv-leaderboard

result

7 of 8 rows marked verified.