Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Models · Qwen2.5-VL-72B15 results · 15 benchmarks
Model card

Qwen2.5-VL-72B.

unknown
§ 02 · Benchmarks

Every benchmark Qwen2.5-VL-72B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01HumanEval+Computer Code · Code Generationpass-187.8%#2/12source ↗
02DocVQAComputer Vision · Document Understandinganls96.4%#5/21source ↗
03MMBenchMultimodal · Visual Question Answeringaccuracy88.6%#7/20source ↗
04Video-MMEMultimodal · Video Understandingaccuracy79.1%#8/24source ↗
05TextVQAMultimodal · Visual Question Answeringaccuracy83.5%#9/23source ↗
06MVBenchMultimodal · Video Understandingaccuracy70.4%#12/20source ↗
07RealWorldQAMultimodal · Visual Question Answeringaccuracy75.7%#12/23source ↗
08MMStarMultimodal · Image-Text-to-Textaccuracy70.8%#14/21source ↗
09MMMUMultimodal · Image-Text-to-Textaccuracy70.2%#15/36source ↗
10GSM8KReasoning · Mathematical Reasoningaccuracy95.3%#23/48source ↗
11MMMU-ProMultimodal · Visual Question Answeringaccuracy51.1%#24/31source ↗
12MATHReasoning · Mathematical Reasoningaccuracy83.0%#26/46source ↗
13OSWorldAgentic AI · Web & Desktop Agentssuccess-rate8.8%#26/28source ↗
14MMLU-ProReasoning · Commonsense Reasoningaccuracy71.2%#61/73source ↗
15GPQA DiamondReasoning · Multi-step Reasoningaccuracy49.0%#67/74source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Qwen2.5-VL-72B actually performs.

Computer Code
1
benchmark
avg rank #2.0
Computer Vision
1
benchmark
avg rank #5.0
Multimodal
8
benchmarks
avg rank #12.6
Agentic AI
1
benchmark
avg rank #26.0
Reasoning
4
benchmarks
avg rank #44.3
§ 04 · Papers

1 paper with results for Qwen2.5-VL-72B.

  1. 2025-02-19· 15 results

    Qwen2.5-VL Technical Report

§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
15
results
0 of 15 rows marked verified.