Who leads the TextVQA benchmark?

Ovis2.5-9B currently leads TextVQA with a score of 91.2 on Accuracy.

What is the state-of-the-art score on TextVQA?

The state-of-the-art result on TextVQA is 91.2 (Accuracy), achieved by Ovis2.5-9B as of 2026.

How many models are tracked on TextVQA?

Codesota tracks 22 models on TextVQA.

When was the TextVQA leaderboard last updated?

The TextVQA leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2022.

Codesota · Benchmark · TextVQAHome/Leaderboards/Multimodal Media/Visual Question Answering/TextVQA

Facebook AI Research

TextVQA.

Name: TextVQA Benchmark Results
Creator: Facebook AI Research
Published: 2022-01-01
License: https://creativecommons.org/licenses/by/4.0/

TextVQA evaluates a model's ability to read and reason about text embedded in images. The test set contains 45,336 questions over 28,408 images with prominent scene text, pushing models beyond pure object recognition into OCR-grounded visual reasoning.

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Accuracy

VQA-style accuracy across answer variants; higher is better.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Ovis2.5-9B	unverified	91.2	2025	Paper ↗Code ↗	Looks wrong?
02	Qwen2.5-VL 72B TextVQA val. Qwen2.5-VL 72B. Table 2. arxiv:2502.13923	verified	85.5	2026	Source ↗	Looks wrong?
03	Qwen2-VL 72B	unverified	85.5	2024	Paper ↗Code ↗	Looks wrong?
04	Llama 3-V (405B)	unverified	84.8	2024	Paper ↗Code ↗	Looks wrong?
05	InternVL2-76B TextVQA val. InternVL2-76B. Table 3. arxiv:2404.16821	verified	84.4	2024	Paper ↗	Looks wrong?
06	Qwen2-VL 7B	unverified	84.3	2024	Paper ↗Code ↗	Looks wrong?
07	MiniCPM-o 4.5-Instruct	unverified	83.8	2026	Paper ↗Code ↗	Looks wrong?
08	Qwen2.5-VL-72B	unverified	83.5	2025	Paper ↗Code ↗	Looks wrong?
09	Llama 3.2 Vision 90B TextVQA val. Llama 3.2 Vision 90B. Table 3. arxiv:2407.21783	verified	83.4	2024	Paper ↗	Looks wrong?
10	BLIP3-o (8B)	unverified	83.1	2025	Paper ↗Code ↗	Looks wrong?
11	Gemini 1.5 Pro TextVQA val. Gemini 1.5 Pro. Table 5. arxiv:2403.05530	verified	82.2	2024	Paper ↗	Looks wrong?
12	Aria	unverified	81.1	2024	Paper ↗Code ↗	Looks wrong?
13	Qianfan-OCR	unverified	80	2026	Paper ↗Code ↗	Looks wrong?
14	Qwen2-VL-2B	unverified	79.7	2024	Paper ↗Code ↗	Looks wrong?
15	GPT-4V TextVQA val. GPT-4V. Reported in multiple papers (Qwen2-VL Table 1, InternVL2 Table 3).	verified	78	2023	Paper ↗	Looks wrong?
16	GPT-4o TextVQA val. GPT-4o. System card Table 1. arxiv:2410.21276	verified	77.4	2024	Paper ↗	Looks wrong?
17	MiniCPM-Llama3-V 2.5	unverified	76.6	2024	Paper ↗Code ↗	Looks wrong?
18	ZAYA1-VL-8B	unverified	74.4	2026	Paper ↗Code ↗	Looks wrong?
19	LLaVA-1.5 TextVQA val. 13B. Table 1. arxiv:2310.03744	verified	61.3	2023	Paper ↗	Looks wrong?
20	AIMv2 ViT-3B/14 + Llama 3.0 8B	unverified	58.2	2024	Paper ↗Code ↗	Looks wrong?
21	BLIP-2 TextVQA val. FlanT5-XXL backbone. Table 9. arxiv:2301.12597	verified	42.5	2023	Paper ↗	Looks wrong?
22	Flamingo (32-shot)	unverified	37.9	2022	Paper ↗Code ↗	Looks wrong?

Lineage

TextVQA in context.

See full visual question answering lineage →

Predecessors (1)

saturated2017-04

VQAv2

Reading text in the image — OCR-grounded sub-task.

This benchmark (1)

active2019-04

TextVQA

None yet — this is the current frontier.

§ 04 · Submit a result

Add to the leaderboard.

← Back to Visual Question Answering