Codesota · Models · Qwen2-VL 72BAlibaba18 results · 12 benchmarks
Model card

Qwen2-VL 72B.

Alibabaopen-sourceVision-Language Model1 current SOTA

Qwen2's large vision-language model.

§ 02 · Benchmarks

Every benchmark Qwen2-VL 72B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01VQA v2.0Multimodal · Visual Question Answeringaccuracy87.6%#1/162024-09-18source ↗
02CC-OCRComputer Vision · General OCR Capabilitieskie-f171.8%#1/5source ↗
03CC-OCRComputer Vision · General OCR Capabilitiesdocument-parsing53.8%#2/6source ↗
04CC-OCRComputer Vision · General OCR Capabilitiesmulti-scene-f178.0%#2/9source ↗
05DocVQAComputer Vision · Document Understandinganls96.5%#2/21source ↗
06TextVQAMultimodal · Visual Question Answeringaccuracy85.5%#2/23source ↗
07CC-OCRComputer Vision · General OCR Capabilitiesmultilingual-f171.1%#3/8source ↗
08TextVQAMultimodal · Visual Question Answeringaccuracy84.9%#4/232024-09-18source ↗
09MMBenchMultimodal · Visual Question Answeringaccuracy88.0%#8/202024-09-18source ↗
10MMBenchMultimodal · Visual Question Answeringaccuracy86.5%#10/20source ↗
11RealWorldQAMultimodal · Visual Question Answeringaccuracy77.8%#10/23source ↗
12MVBenchMultimodal · Video Understandingaccuracy73.6%#11/20source ↗
13MMStarMultimodal · Image-Text-to-Textaccuracy68.3%#18/21source ↗
14Video-MMEMultimodal · Video Understandingaccuracy71.2%#18/24source ↗
15MMMUMultimodal · Visual Question Answeringaccuracy64.5%#19/302024-09-18source ↗
16MMMUMultimodal · Visual Question Answeringaccuracy64.5%#19/30source ↗
17MMMUMultimodal · Image-Text-to-Textaccuracy64.5%#20/36source ↗
18MMMU-ProMultimodal · Visual Question Answeringaccuracy46.2%#26/31source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Qwen2-VL 72B actually performs.

Computer Vision
2
benchmarks
avg rank #2.0 · 1 SOTA
Multimodal
10
benchmarks
avg rank #12.8
§ 04 · Papers

2 papers with results for Qwen2-VL 72B.

  1. 2024-09-18· Multimodal· 4 results

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

  2. 2024-09-18· 10 results

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

§ 05 · Related models

Other Alibaba models scored on Codesota.

Qwen3-235B-A22B
235B (22B active) params · 9 results · 1 SOTA
Qwen3.5-397B-A17B
8 results
Qwen3.5-122B-A10B
6 results
Qwen3.5-27B
6 results
Qwen3.5-35B-A3B
6 results
Qwen2-VL 7B
7B params · 5 results
Qwen2.5-72B-Instruct
72B params · 4 results
Qwen2.5-Coder 32B
32B params · 4 results
§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
10
results
arxiv
4
results
alphaxiv-leaderboard
2
results
cc-ocr-paper
2
results
6 of 18 rows marked verified.