LLaVA-1.5.

UW-Madison / Microsoftopen-sourceUnknown paramsCLIP ViT-L + MLP projector + Vicuna-13B

Improved LLaVA with MLP connector and VQA data. 13B params. Strong open-source VLM baseline 2023-2024. Source: arxiv:2310.03744.

§ 01 · Benchmarks

Every benchmark LLaVA-1.5 has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	VQA v2.0	Multimodal · Visual Question Answering	accuracy	80.0%	#5/7	2023-10-05	source ↗
02	MMBench	Multimodal · Visual Question Answering	accuracy	67.7%	#8/8	2023-10-05	source ↗
03	TextVQA	Multimodal · Visual Question Answering	accuracy	61.3%	#8/9	2023-10-05	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 02 · Strengths by area