Codesota · Benchmark · MMMU-ProHome/Leaderboards/Multimodal Media/Visual Question Answering/MMMU-Pro
Unknown

MMMU-Pro.

Harder MMMU variant with vision-only questions and ten answer choices — fixes the text-only shortcuts readers exploited in the original.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for MMMU-Pro. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Gemini-3.1-Proverified822026Source ↗Looks wrong?
02GPT-5.2verified812025Source ↗Looks wrong?
03Gemini 3 Proverified802026Source ↗Looks wrong?
04Kimi K2.6unverified79.42026Paper ↗Looks wrong?
05Qwen3.5-397B-A17Bunverified792026Paper ↗Code ↗Looks wrong?
06Kimi-K2.5unverified78.52026Paper ↗Code ↗Looks wrong?
07Qwen3.5-122B-A10Bunverified76.92026Paper ↗Code ↗Source ↗Looks wrong?
08Gemma 4 31Bunverified76.92026Paper ↗Looks wrong?
09GPT-5.1verified76.52025Source ↗Looks wrong?
10Qwen3.6-27Bunverified75.82026Paper ↗Code ↗Looks wrong?
11Qwen3.6-35B-A3Bunverified75.32026Paper ↗Code ↗Looks wrong?
12Qwen3.5-35B-A3Bunverified75.12026Paper ↗Code ↗Source ↗Looks wrong?
13Qwen3.5-27Bunverified752026Paper ↗Code ↗Source ↗Looks wrong?
14Qwen3.5-Omni-Plusunverified73.92026Paper ↗Looks wrong?
15Qwen3.6 Plusverified73.82026Source ↗Looks wrong?
16SenseNova-U1-A3B-MoTunverified72.832026Paper ↗Code ↗Looks wrong?
17Intern-S1-Prounverified72.82026Paper ↗Source ↗Looks wrong?
18Qwen3-VL-235B-A22B-Thinkingunverified69.32025Paper ↗Code ↗Looks wrong?
19Qwen3-VL-235B-A22B-Instructunverified68.12025Paper ↗Code ↗Looks wrong?
20Qwen3-Omni-Flash-Thinkingunverified60.82025Paper ↗Code ↗Looks wrong?
21Qwen3-VL-8B-Instructunverified55.92025Paper ↗Code ↗Looks wrong?
22Ovis2.5-9Bunverified54.42025Paper ↗Code ↗Looks wrong?
23MiniMax-VL-01unverified52.72025Paper ↗Code ↗Looks wrong?
24Qwen2.5-VL-72Bunverified51.12025Paper ↗Code ↗Looks wrong?
25Kimi-VL-A3B-Thinking-2506unverified46.32025Paper ↗Code ↗Looks wrong?
26Qwen2-VL 72Bunverified46.22024Paper ↗Code ↗Looks wrong?
27Qwen2-VL 7Bunverified43.52024Paper ↗Code ↗Looks wrong?
28Qwen2-VL-2Bunverified37.62024Paper ↗Code ↗Looks wrong?
29VideoLLaMA3 7Bunverified33.62025Paper ↗Code ↗Looks wrong?
30MiniCPM-V 4.6-Thinking (16x)unverified32.52026Paper ↗Looks wrong?
31VideoLLaMA3 2Bunverified28.62025Paper ↗Code ↗Looks wrong?
Lineage

MMMU-Pro in context.

See full visual question answering lineage →
This benchmark (1)
active2024-09
MMMU-Pro
None yet — this is the current frontier.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Visual Question Answering