Codesota · Benchmark · TextVQAHome/Leaderboards/Multimodal Media/Visual Question Answering/TextVQA
Facebook AI Research

TextVQA.

TextVQA evaluates a model's ability to read and reason about text embedded in images. The test set contains 45,336 questions over 28,408 images with prominent scene text, pushing models beyond pure object recognition into OCR-grounded visual reasoning.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

VQA-style accuracy across answer variants; higher is better.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Ovis2.5-9Bunverified91.22025Paper ↗Code ↗Looks wrong?
02Qwen2.5-VL 72B
TextVQA val. Qwen2.5-VL 72B. Table 2. arxiv:2502.13923
verified85.52026Source ↗Looks wrong?
03Qwen2-VL 72Bunverified85.52024Paper ↗Code ↗Looks wrong?
04Llama 3-V (405B)unverified84.82024Paper ↗Code ↗Looks wrong?
05InternVL2-76B
TextVQA val. InternVL2-76B. Table 3. arxiv:2404.16821
verified84.42024Paper ↗Looks wrong?
06Qwen2-VL 7Bunverified84.32024Paper ↗Code ↗Looks wrong?
07MiniCPM-o 4.5-Instructunverified83.82026Paper ↗Code ↗Looks wrong?
08Qwen2.5-VL-72Bunverified83.52025Paper ↗Code ↗Looks wrong?
09Llama 3.2 Vision 90B
TextVQA val. Llama 3.2 Vision 90B. Table 3. arxiv:2407.21783
verified83.42024Paper ↗Looks wrong?
10BLIP3-o (8B)unverified83.12025Paper ↗Code ↗Looks wrong?
11Gemini 1.5 Pro
TextVQA val. Gemini 1.5 Pro. Table 5. arxiv:2403.05530
verified82.22024Paper ↗Looks wrong?
12Ariaunverified81.12024Paper ↗Code ↗Looks wrong?
13Qianfan-OCRunverified802026Paper ↗Code ↗Looks wrong?
14Qwen2-VL-2Bunverified79.72024Paper ↗Code ↗Looks wrong?
15GPT-4V
TextVQA val. GPT-4V. Reported in multiple papers (Qwen2-VL Table 1, InternVL2 Table 3).
verified782023Paper ↗Looks wrong?
16GPT-4o
TextVQA val. GPT-4o. System card Table 1. arxiv:2410.21276
verified77.42024Paper ↗Looks wrong?
17MiniCPM-Llama3-V 2.5unverified76.62024Paper ↗Code ↗Looks wrong?
18ZAYA1-VL-8Bunverified74.42026Paper ↗Code ↗Looks wrong?
19LLaVA-1.5
TextVQA val. 13B. Table 1. arxiv:2310.03744
verified61.32023Paper ↗Looks wrong?
20AIMv2 ViT-3B/14 + Llama 3.0 8Bunverified58.22024Paper ↗Code ↗Looks wrong?
21BLIP-2
TextVQA val. FlanT5-XXL backbone. Table 9. arxiv:2301.12597
verified42.52023Paper ↗Looks wrong?
22Flamingo (32-shot)unverified37.92022Paper ↗Code ↗Looks wrong?
Lineage

TextVQA in context.

See full visual question answering lineage →
This benchmark (1)
active2019-04
TextVQA
None yet — this is the current frontier.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Visual Question Answering