Codesota · Computer Vision · Document Understanding · DocVQATasks/Computer Vision/Document Understanding
Document Understanding · benchmark dataset · 2020 · EN

Document Visual Question Answering.

Document question-answering benchmark where systems answer natural-language questions grounded in document images.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

21 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.


Primary
anls · higher is better
anls· primary
21 rows
#ModelOrgSubmittedPaper / codeanls
01Qwen3-VL-235B-A22B-InstructQwenNov 2025Qwen3-VL Technical Report · code97.10
02Qwen3-VL-235B-A22B-ThinkingQwenNov 2025Qwen3-VL Technical Report · code96.50
03Qwen2-VL 72BOpenAlibabaSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code96.50
04Infinity-Parser2-ProMay 2026pwc-dump96.43
05Qwen2.5-VL-72BFeb 2025Qwen2.5-VL Technical Report · code96.40
06Ovis2.5-9BAug 2025Ovis2.5 Technical Report · code96.30
07Qwen3-VL-8B-InstructQwenNov 2025Qwen3-VL Technical Report · code96.10
08VideoLLaMA3 7BJan 2025VideoLLaMA 3: Frontier Multimodal Foundation Models for … · code94.90
09MiniCPM-o 4.5-InstructApr 2026MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal … · code94.70
10Qwen2-VL 7BAlibabaSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code94.50
11Qianfan-OCROpenBaidu QianfanMar 2026Qianfan-OCR: A Unified End-to-End Model for Document Int… · code92.80
12AriaOct 2024Aria: An Open Multimodal Native Mixture-of-Experts Model · code92.60
13Llama 3-V (405B)Jul 2024The Llama 3 Herd of Models · code92.60
14ZAYA1-VL-8BMay 2026pwc-dump · code92.50
15VideoLLaMA3 2BJan 2025VideoLLaMA 3: Frontier Multimodal Foundation Models for … · code91.90
16dots.mocrMar 2026Multimodal OCR: Parse Anything from Documents · code91.85
17Qwen2-VL-2BSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code90.10
18LayoutLMv2 Large + QGDec 2020LayoutLMv2: Multi-modal Pre-training for Visually-Rich D… · code86.72
19MiniCPM-V 4.6-Thinking (16x)May 2026pwc-dump86.70
20MiniCPM-Llama3-V 2.5Aug 2024MiniCPM-V: A GPT-4V Level MLLM on Your Phone · code84.80
21AIMv2 ViT-3B/14 + Llama 3.0 8BNov 2024Multimodal Autoregressive Pre-training of Large Vision E… · code30.40
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

4 steps
of state of the art.

Each row below marks a model that broke the previous record on anls. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · anls
  1. Dec 29, 2020LayoutLMv2 Large + QG86.72
  2. Jul 31, 2024Llama 3-V (405B)92.60
  3. Sep 18, 2024Qwen2-VL 72BAlibaba96.50
  4. Nov 26, 2025Qwen3-VL-235B-A22B-InstructQwen97.10
Fig 3 · SOTA-setting models only. 4 entries span Dec 2020 Nov 2025.
§ 04 · Literature

13 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies