GPT-4V.

UnknownmultimodalUnknown paramsTransformer

GPT-4 with Vision. First major multimodal GPT-4 release, Sept 2023. Evaluated on MMMU, VQA, TextVQA. Source: GPT-4 Technical Report.

§ 02 · Benchmarks

Every benchmark GPT-4V has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	VQA v2.0	Multimodal · Visual Question Answering	accuracy	77.2%	#13/16	2023-03-15	source ↗
02	TextVQA	Multimodal · Visual Question Answering	accuracy	78.0%	#16/23	2023-03-15	source ↗
03	MMBench	Multimodal · Visual Question Answering	accuracy	75.8%	#17/20	2023-03-15	source ↗
04	MMMU	Multimodal · Visual Question Answering	accuracy	56.8%	#25/30	2023-03-15	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area