Codesota · Multimodal · Visual Question Answering · TextVQATasks/Multimodal/Visual Question Answering
Visual Question Answering · benchmark dataset · 2019 · EN

TextVQA: Towards VQA Models That Can Read.

Visual Question Answering dataset requiring models to read and reason about text in natural images. Contains 45,336 questions about 28,408 images from Open Images dataset. Questions require OCR-based reasoning, e.g. "What does the sign say?". A standard benchmark for evaluating text understanding within visual scenes. ANLS and exact-match accuracy metrics.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

9 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.


Primary
accuracy · higher is better
accuracy· primary
9 rows
#ModelOrgSubmittedPaper / codeaccuracy
01Qwen2.5-VL 72BOSSAlibabaFeb 2025Qwen2.5-VL Technical Report85.50
02Qwen2-VL 72BOSSAlibabaSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o…84.90
03InternVL2-76BOSSShanghai AI LabApr 2024InternVL: Scaling up Vision Foundation Models and Aligni…84.40
04Llama 3.2 Vision 90BOSSMetaJul 2024The Llama 3 Herd of Models83.40
05Gemini 1.5 ProAPIGoogleFeb 2024Gemini 1.5: Unlocking multimodal understanding across mi…82.20
06GPT-4VMar 2023GPT-4 Technical Report78
07GPT-4oAPIOpenAIOct 2024SWE-bench Verified77.40
08LLaVA-1.5OSSUW-Madison / MicrosoftOct 2023Improved Baselines with Visual Instruction Tuning (LLaVA…61.30
09BLIP-2OSSSalesforceJan 2023BLIP-2: Bootstrapping Language-Image Pre-training with F…42.50
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

6 steps
of state of the art.

Each row below marks a model that broke the previous record on accuracy. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · accuracy
  1. Jan 30, 2023BLIP-2Salesforce42.50
  2. Mar 15, 2023GPT-4V78
  3. Feb 15, 2024Gemini 1.5 ProGoogle82.20
  4. Apr 25, 2024InternVL2-76BShanghai AI Lab84.40
  5. Sep 18, 2024Qwen2-VL 72BAlibaba84.90
  6. Feb 19, 2025Qwen2.5-VL 72BAlibaba85.50
Fig 3 · SOTA-setting models only. 6 entries span Jan 2023 Feb 2025.
§ 04 · Literature

9 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies