VQA v2.0.

Name: VQA v2.0 Benchmark Results
Creator: Unknown
License: https://creativecommons.org/licenses/by/4.0/

265K images with 1.1M questions. Balanced dataset to reduce language biases found in v1.

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

accuracy

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Source
01	Qwen2-VL 72B VQA-v2 test-dev. Qwen2-VL 72B. Table 1. arxiv:2409.12191	verified	87.6	2026	Source ↗
02	Qwen2-VL 72B VQA-v2 test-dev. Qwen2-VL 72B. Table 1. arxiv:2409.12191	verified	87.6	2024	Source ↗
03	InternVL2-76B VQA-v2 test-dev. InternVL2-76B. Table 3. arxiv:2404.16821	verified	87.2	2026	Source ↗
04	InternVL2-76B VQA-v2 test-dev. InternVL2-76B. Table 3. arxiv:2404.16821	verified	87.2	2024	Source ↗
05	Gemini 1.5 Pro VQA-v2 test-dev. Table 5. Gemini 1.5 paper arxiv:2403.05530	verified	86.5	2026	Source ↗
06	Gemini 1.5 Pro VQA-v2 test-dev. Table 5. Gemini 1.5 paper arxiv:2403.05530	verified	86.5	2026	Source ↗
07	Gemini 1.5 Pro VQA-v2 test-dev. Table 5. Gemini 1.5 paper arxiv:2403.05530	verified	86.5	2024	Source ↗
08	PaLI-X 55B VQA v2 test-dev. From Table 3 of PaLI-X paper (arxiv 2305.18565). State-of-the-art for encoder-decoder VLMs.	verified	86.1	2023	Source ↗
09	NVLM-D 1.0 72B VQA v2 test-dev. From NVLM paper (arxiv 2409.11402) Table 7. Decoder-only architecture. Highest among open-access models at time of release.	verified	85.4	2024	Source ↗
10	NVLM-H 1.0 72B VQA v2 test-dev. From NVLM paper (arxiv 2409.11402) Table 7. Cross-attention architecture.	verified	85.2	2024	Source ↗
11	NVLM-X 1.0 72B VQA v2 test-dev. From NVLM paper (arxiv 2409.11402) Table 7. Hybrid architecture.	verified	85.2	2024	Source ↗
12	VILA-1.5 40B VQA v2 test-dev. Reported in NVLM paper (arxiv 2409.11402) Table 7. VILA-1.5 40B released Apr 2024.	verified	84.3	2024	Source ↗
13	LLaVA-NeXT 34B VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024. Best open-source at time of release.	verified	83.7	2024	Source ↗
14	LLaVA-NeXT 13B VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024.	verified	82.8	2024	Source ↗
15	CogVLM-17B CogVLM-17B. VQAv2 test-dev accuracy. NeurIPS 2024. Tsinghua/Zhipu.	verified	82.3	2023	Source ↗
16	LLaVA-NeXT 7B (Mistral) VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024.	verified	82.2	2024	Source ↗
17	BLIP-2 VQA-v2 test-dev. FlanT5-XXL backbone. Table 9. arxiv:2301.12597	verified	82.19	2023	Source ↗
18	BLIP-2 VQA-v2 test-dev. FlanT5-XXL backbone. Table 9. arxiv:2301.12597	verified	82.19	2026	Source ↗
19	LLaVA-NeXT 7B (Vicuna) VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024.	verified	81.8	2024	Source ↗
20	Pixtral Large VQA v2. Self-reported by Mistral AI. Pixtral Large 124B released Nov 2024. Score reported as 0.809 (80.9%).	paper	80.9	2024	Source ↗
21	Llama 3-V 405B VQA v2 test-dev. Reported in NVLM paper (arxiv 2409.11402) Table 7.	verified	80.2	2024	Source ↗
22	LLaVA-1.5 VQA-v2 test-dev. 13B (Vicuna) backbone. Table 1. arxiv:2310.03744	verified	80	2026	Source ↗
23	LLaVA-1.5 VQA-v2 test-dev. 13B (Vicuna) backbone. Table 1. arxiv:2310.03744	verified	80	2023	Source ↗
24	LLaVA-1.5 13B VQA v2 test-dev. From "Improved Baselines with Visual Instruction Tuning" (LLaVA-1.5), CVPR 2024. Also reported as baseline in LLaVA-NeXT blog.	verified	80	2023	Source ↗
25	Llama 3-V 70B VQA v2 test-dev. Reported in NVLM paper (arxiv 2409.11402) Table 7.	verified	79.1	2024	Source ↗
26	Pixtral-12B VQA v2. Self-reported by Mistral AI. Pixtral-12B released Sep 2024. Score reported as 0.786 (78.6%).	paper	78.6	2024	Source ↗
27	GPT-4o VQA-v2 test-dev. GPT-4o system card Table 1. arxiv:2410.21276	verified	78.5	2024	Source ↗
28	GPT-4o VQA-v2 test-dev. GPT-4o system card Table 1. arxiv:2410.21276	verified	78.5	2026	Source ↗
29	Llama 3.2 90B Vision Instruct VQA v2. Reported by Meta for Llama 3.2 90B multimodal model. Self-reported score of 0.781 (78.1%).	paper	78.1	2024	Source ↗
30	GPT-4V VQA-v2 val, 0-shot. Table 2. GPT-4 Technical Report arxiv:2303.08774	verified	77.2	2023	Source ↗
31	GPT-4V VQA-v2 val, 0-shot. Table 2. GPT-4 Technical Report arxiv:2303.08774	verified	77.2	2023	Source ↗