Visual Question Answering2019en

TextVQA: Towards VQA Models That Can Read

Visual Question Answering dataset requiring models to read and reason about text in natural images. Contains 45,336 questions about 28,408 images from Open Images dataset. Questions require OCR-based reasoning, e.g. "What does the sign say?". A standard benchmark for evaluating text understanding within visual scenes. ANLS and exact-match accuracy metrics.

Samples:45,336
Metrics:accuracy
Paper / Website
Current State of the Art

Qwen2.5-VL 72B

Alibaba

85.5

accuracy

accuracy Progress Over Time

Showing 6 breakthroughs from Jan 2023 to Feb 2025

38.251.164.076.989.8Jan 2023Jun 2023Nov 2023Apr 2024Sep 2024Feb 2025accuracyDate

Key Milestones

Jan 2023
BLIP-2

TextVQA val. FlanT5-XXL backbone. Table 9. arxiv:2301.12597

42.5
Mar 2023
GPT-4V

TextVQA val. GPT-4V. Reported in multiple papers (Qwen2-VL Table 1, InternVL2 Table 3).

78.0
+83.5%
Mar 2024
Gemini 1.5 Pro

TextVQA val. Gemini 1.5 Pro. Table 5. arxiv:2403.05530

82.2
+5.4%
Apr 2024
InternVL2-76B

TextVQA val. InternVL2-76B. Table 3. arxiv:2404.16821

84.4
+2.7%
Sep 2024
Qwen2-VL 72B

TextVQA val. Qwen2-VL 72B. Table 1. arxiv:2409.12191

84.9
+0.6%
Feb 2025
Qwen2.5-VL 72BCurrent SOTA

TextVQA val. Qwen2.5-VL 72B. Table 2. arxiv:2502.13923

85.5
+0.7%
Total Improvement
101.2%
Time Span
2y 1m
Breakthroughs
6
Current SOTA
85.5

Top Models Performance Comparison

Top 9 models ranked by accuracy

accuracy1Qwen2.5-VL 72B85.5100.0%2Qwen2-VL 72B84.999.3%3InternVL2-76B84.498.7%4Llama 3.2 Vision 90B83.497.5%5Gemini 1.5 Pro82.296.1%6GPT-4V78.091.2%7GPT-4o77.490.5%8LLaVA-1.561.371.7%9BLIP-242.549.7%0%25%50%75%100%% of best
Best Score
85.5
Top Model
Qwen2.5-VL 72B
Models Compared
9
Score Range
43.0

accuracyPrimary

#ModelScorePaper / CodeDate
1
Qwen2.5-VL 72BOpen Source
Alibaba
85.5Feb 2025
2
Qwen2-VL 72BOpen Source
Alibaba
84.9Sep 2024
3
InternVL2-76BOpen Source
Shanghai AI Lab
84.4Apr 2024
4
Llama 3.2 Vision 90BOpen Source
Meta
83.4Jul 2024
5
Gemini 1.5 ProAPI
Google
82.2Feb 2024
6
GPT-4V
78Mar 2023
7
GPT-4oAPI
OpenAI
77.4Oct 2024
8
LLaVA-1.5Open Source
UW-Madison / Microsoft
61.3Oct 2023
9
BLIP-2Open Source
Salesforce
42.5Jan 2023

Related Papers9

Other Visual Question Answering Datasets