14,055 questions requiring outside knowledge to answer. Tests models that must consult external knowledge sources beyond visual content.
Accuracy is the reported evaluation metric for OK-VQA. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | PaLI-X-55B | verified | 66.1 | 2023 | Source ↗ | Looks wrong? |
| 02 | PaLI-17B | verified | 64.5 | 2022 | Source ↗ | Looks wrong? |
| 03 | GPT-4V | verified | 64.28 | 2023 | Source ↗ | Looks wrong? |
| 04 | Flamingo-80B | verified | 57.8 | 2022 | Source ↗ | Looks wrong? |
| 05 | BLIP-2 (FlanT5XXL) | verified | 44.7 | 2023 | Source ↗ | Looks wrong? |