Codesota · Multimodal · Image-Text-to-Text · MMMUTasks/Multimodal/Image-Text-to-Text
Image-Text-to-Text · benchmark dataset · 2023 · EN

MMMU.

Massive multi-discipline multimodal understanding across 30 subjects

Submit a result
§ 01 · Leaderboard

Best published scores.

36 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.


Primary
accuracy · higher is better
accuracy· primary
36 rows
#ModelOrgSubmittedPaper / codeaccuracy
01Qwen3.5-397B-A17BOpenAlibabaFeb 2026pwc-dump · code85
02Qwen3.5-122B-A10BOpenAlibabaFeb 2026pwc-dump · code83.90
03Qwen3.6-27BApr 2026pwc-dump · code82.90
04Qwen3.5-27BOpenAlibabaFeb 2026pwc-dump · code82.30
05Gemini 2.5 ProJul 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning…82
06Qwen3.6-35B-A3BApr 2026pwc-dump · code81.70
07Qwen3.5-35B-A3BOpenAlibabaFeb 2026pwc-dump · code81.40
08Qwen3-VL-235B-A22B-ThinkingQwenNov 2025Qwen3-VL Technical Report · code80.60
09SenseNova-U1-A3B-MoTSenseTimeMay 2026SenseNova-U1: Unifying Multimodal Understanding and Gene… · code80.55
10Qwen3.5-Omni-PlusApr 2026Qwen3.5-Omni Technical Report80.10
11Gemini 2.5 FlashJul 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning…79.70
12Qwen3-VL-235B-A22B-InstructQwenNov 2025Qwen3-VL Technical Report · code78.70
13InternVL3-78BOpenShanghai AI LabApr 2025InternVL3: Exploring Advanced Training and Test-Time Rec… · code72.20
14Ovis2.5-9BAug 2025Ovis2.5 Technical Report · code71.20
15Qwen2.5-VL-72BFeb 2025Qwen2.5-VL Technical Report · code70.20
16Qwen3-VL-8B-InstructQwenNov 2025Qwen3-VL Technical Report · code69.60
17MiniMax-VL-01Jan 2025MiniMax-01: Scaling Foundation Models with Lightning Att… · code68.50
18MiniCPM-o 4.5-InstructApr 2026MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal … · code67.60
19Gemma 3 (27B, IT)Mar 2025Gemma 3 Technical Report · code64.90
20Llama 3-V (405B)Jul 2024The Llama 3 Herd of Models · code64.50
21Qwen2-VL 72BOpenAlibabaSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code64.50
22Kimi-VL-A3B-Thinking-2506Apr 2025Kimi-VL Technical Report · code64
23Infinity-Parser2-ProMay 2026pwc-dump61.89
24Qwen3-Omni-30B-A3B-Base-202507Sep 2025Qwen3-Omni Technical Report · code59.33
25Lumina-DiMOOOct 2025Lumina-DiMOO: An Omni Diffusion Large Language Model for… · code58.60
26Kimi-VL-A3B-InstructApr 2025Kimi-VL Technical Report · code57
27BAGEL (7B MoT)May 2025Emerging Properties in Unified Multimodal Pretraining · code55.30
28AriaOct 2024Aria: An Open Multimodal Native Mixture-of-Experts Model · code54.90
29MiniCPM-V 4.6-Thinking (16x)May 2026pwc-dump54.50
30Qwen2-VL 7BAlibabaSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code54.10
31BLIP3-o (8B)May 2025BLIP3-o: A Family of Fully Open Unified Multimodal Model… · code50.60
32VideoLLaMA3 7BJan 2025VideoLLaMA 3: Frontier Multimodal Foundation Models for … · code48.80
33ZAYA1-VL-8BMay 2026pwc-dump · code46
34MiniCPM-Llama3-V 2.5Aug 2024MiniCPM-V: A GPT-4V Level MLLM on Your Phone · code45.80
35VideoLLaMA3 2BJan 2025VideoLLaMA 3: Frontier Multimodal Foundation Models for … · code45.30
36Qwen2-VL-2BSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code41.10
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

6 steps
of state of the art.

Each row below marks a model that broke the previous record on accuracy. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · accuracy
  1. Jul 31, 2024Llama 3-V (405B)64.50
  2. Jan 14, 2025MiniMax-VL-0168.50
  3. Feb 19, 2025Qwen2.5-VL-72B70.20
  4. Apr 14, 2025InternVL3-78BShanghai AI Lab72.20
  5. Jul 7, 2025Gemini 2.5 Pro82
  6. Feb 16, 2026Qwen3.5-397B-A17BAlibaba85
Fig 3 · SOTA-setting models only. 6 entries span Jul 2024 Feb 2026.
§ 04 · Literature

20 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies