| 01 | Qwen2-VL 72B VQA-v2 test-dev. Qwen2-VL 72B. Table 1. arxiv:2409.12191 | verified | 87.6 | 2026 | Source ↗ |
| 02 | Qwen2-VL 72B VQA-v2 test-dev. Qwen2-VL 72B. Table 1. arxiv:2409.12191 | verified | 87.6 | 2024 | Source ↗ |
| 03 | InternVL2-76B VQA-v2 test-dev. InternVL2-76B. Table 3. arxiv:2404.16821 | verified | 87.2 | 2026 | Source ↗ |
| 04 | InternVL2-76B VQA-v2 test-dev. InternVL2-76B. Table 3. arxiv:2404.16821 | verified | 87.2 | 2024 | Source ↗ |
| 05 | Gemini 1.5 Pro VQA-v2 test-dev. Table 5. Gemini 1.5 paper arxiv:2403.05530 | verified | 86.5 | 2026 | Source ↗ |
| 06 | Gemini 1.5 Pro VQA-v2 test-dev. Table 5. Gemini 1.5 paper arxiv:2403.05530 | verified | 86.5 | 2026 | Source ↗ |
| 07 | Gemini 1.5 Pro VQA-v2 test-dev. Table 5. Gemini 1.5 paper arxiv:2403.05530 | verified | 86.5 | 2024 | Source ↗ |
| 08 | PaLI-X 55B VQA v2 test-dev. From Table 3 of PaLI-X paper (arxiv 2305.18565). State-of-the-art for encoder-decoder VLMs. | verified | 86.1 | 2023 | Source ↗ |
| 09 | NVLM-D 1.0 72B VQA v2 test-dev. From NVLM paper (arxiv 2409.11402) Table 7. Decoder-only architecture. Highest among open-access models at time of release. | verified | 85.4 | 2024 | Source ↗ |
| 10 | NVLM-H 1.0 72B VQA v2 test-dev. From NVLM paper (arxiv 2409.11402) Table 7. Cross-attention architecture. | verified | 85.2 | 2024 | Source ↗ |
| 11 | NVLM-X 1.0 72B VQA v2 test-dev. From NVLM paper (arxiv 2409.11402) Table 7. Hybrid architecture. | verified | 85.2 | 2024 | Source ↗ |
| 12 | VILA-1.5 40B VQA v2 test-dev. Reported in NVLM paper (arxiv 2409.11402) Table 7. VILA-1.5 40B released Apr 2024. | verified | 84.3 | 2024 | Source ↗ |
| 13 | LLaVA-NeXT 34B VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024. Best open-source at time of release. | verified | 83.7 | 2024 | Source ↗ |
| 14 | LLaVA-NeXT 13B VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024. | verified | 82.8 | 2024 | Source ↗ |
| 15 | CogVLM-17B CogVLM-17B. VQAv2 test-dev accuracy. NeurIPS 2024. Tsinghua/Zhipu. | verified | 82.3 | 2023 | Source ↗ |
| 16 | LLaVA-NeXT 7B (Mistral) VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024. | verified | 82.2 | 2024 | Source ↗ |
| 17 | BLIP-2 VQA-v2 test-dev. FlanT5-XXL backbone. Table 9. arxiv:2301.12597 | verified | 82.19 | 2023 | Source ↗ |
| 18 | BLIP-2 VQA-v2 test-dev. FlanT5-XXL backbone. Table 9. arxiv:2301.12597 | verified | 82.19 | 2026 | Source ↗ |
| 19 | LLaVA-NeXT 7B (Vicuna) VQA v2 test-dev. From official LLaVA-NeXT (LLaVA-1.6) blog post, Jan 2024. | verified | 81.8 | 2024 | Source ↗ |
| 20 | Pixtral Large VQA v2. Self-reported by Mistral AI. Pixtral Large 124B released Nov 2024. Score reported as 0.809 (80.9%). | paper | 80.9 | 2024 | Source ↗ |
| 21 | Llama 3-V 405B VQA v2 test-dev. Reported in NVLM paper (arxiv 2409.11402) Table 7. | verified | 80.2 | 2024 | Source ↗ |
| 22 | LLaVA-1.5 VQA-v2 test-dev. 13B (Vicuna) backbone. Table 1. arxiv:2310.03744 | verified | 80 | 2026 | Source ↗ |
| 23 | LLaVA-1.5 VQA-v2 test-dev. 13B (Vicuna) backbone. Table 1. arxiv:2310.03744 | verified | 80 | 2023 | Source ↗ |
| 24 | LLaVA-1.5 13B VQA v2 test-dev. From "Improved Baselines with Visual Instruction Tuning" (LLaVA-1.5), CVPR 2024. Also reported as baseline in LLaVA-NeXT blog. | verified | 80 | 2023 | Source ↗ |
| 25 | Llama 3-V 70B VQA v2 test-dev. Reported in NVLM paper (arxiv 2409.11402) Table 7. | verified | 79.1 | 2024 | Source ↗ |
| 26 | Pixtral-12B VQA v2. Self-reported by Mistral AI. Pixtral-12B released Sep 2024. Score reported as 0.786 (78.6%). | paper | 78.6 | 2024 | Source ↗ |
| 27 | GPT-4o VQA-v2 test-dev. GPT-4o system card Table 1. arxiv:2410.21276 | verified | 78.5 | 2024 | Source ↗ |
| 28 | GPT-4o VQA-v2 test-dev. GPT-4o system card Table 1. arxiv:2410.21276 | verified | 78.5 | 2026 | Source ↗ |
| 29 | Llama 3.2 90B Vision Instruct VQA v2. Reported by Meta for Llama 3.2 90B multimodal model. Self-reported score of 0.781 (78.1%). | paper | 78.1 | 2024 | Source ↗ |
| 30 | GPT-4V VQA-v2 val, 0-shot. Table 2. GPT-4 Technical Report arxiv:2303.08774 | verified | 77.2 | 2023 | Source ↗ |
| 31 | GPT-4V VQA-v2 val, 0-shot. Table 2. GPT-4 Technical Report arxiv:2303.08774 | verified | 77.2 | 2023 | Source ↗ |