14,055 questions requiring outside knowledge to answer. Tests models that must consult external knowledge sources beyond visual content.
accuracy
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | PaLI-X-55B PaLI-X 55B fine-tuned on OK-VQA. 2023. Google Research. | Community | 66.1 | 2023 | Source |
| 2 | PaLI-17B PaLI-17B fine-tuned on OK-VQA. ICLR 2023. Google Research. | Community | 64.5 | 2022 | Source |
| 3 | GPT-4V GPT-4V zero-shot on OK-VQA test (commonsense knowledge subset). Nov 2023. OpenAI. | Community | 64.28 | 2023 | Source |
| 4 | Flamingo-80B Flamingo-80B, 32-shot. OK-VQA test set. NeurIPS 2022. DeepMind. | Community | 57.8 | 2022 | Source |
| 5 | BLIP-2 (FlanT5XXL) BLIP-2 with FlanT5XXL backbone. Zero-shot OK-VQA test. ICML 2023. Salesforce. | Community | 44.7 | 2023 | Source |