Codesota · Benchmark · OK-VQAHome/Leaderboards/Multimodal Media/Visual Question Answering/OK-VQA
Unknown

OK-VQA.

14,055 questions requiring outside knowledge to answer. Tests models that must consult external knowledge sources beyond visual content.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

accuracy

Accuracy is the reported evaluation metric for OK-VQA. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

RankModelTrustScoreYearLinksFix
01PaLI-X-55B
PaLI-X 55B fine-tuned on OK-VQA. 2023. Google Research.
verified66.12023Source ↗Looks wrong?
02PaLI-17B
PaLI-17B fine-tuned on OK-VQA. ICLR 2023. Google Research.
verified64.52022Source ↗Looks wrong?
03GPT-4V
GPT-4V zero-shot on OK-VQA test (commonsense knowledge subset). Nov 2023. OpenAI.
verified64.282023Source ↗Looks wrong?
04Flamingo-80B
Flamingo-80B, 32-shot. OK-VQA test set. NeurIPS 2022. DeepMind.
verified57.82022Source ↗Looks wrong?
05BLIP-2 (FlanT5XXL)
BLIP-2 with FlanT5XXL backbone. Zero-shot OK-VQA test. ICML 2023. Salesforce.
verified44.72023Source ↗Looks wrong?
Lineage

OK-VQA in context.

See full visual question answering lineage →
This benchmark (1)
active2019-06
OK-VQA
Successors (1)
active2022-06
A-OKVQA
Broader knowledge types and better annotation.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Visual Question Answering
OK-VQA Leaderboard | CodeSOTA | CodeSOTA