Visual Question Answering2024en

Massive Multidiscipline Multimodal Understanding

Massive Multidiscipline Multimodal Understanding benchmark covering 11.5K multimodal questions across 183 subfields from college-level exams in Art, Business, Science, Health, Humanities, and Tech. Requires deep reasoning over images, diagrams, and text. 30 subjects per discipline. Tests multi-image understanding and expert-level domain knowledge. A key VLM reasoning benchmark since early 2024.

Samples:11,550
Metrics:accuracy
Paper / Website
Current State of the Art

InternVL3-78B

Shanghai AI Lab

73.3

accuracy

accuracy Progress Over Time

Showing 5 breakthroughs from Mar 2023 to Jan 2025

55.160.165.070.075.0Mar 2023Aug 2023Jan 2024Jul 2024Jan 2025accuracyDate

Key Milestones

Mar 2023
GPT-4V

MMMU val. 0-shot. MMMU benchmark paper Table 1. Source cross-referenced with GPT-4 Technical Report.

56.8
Mar 2024
Gemini 1.5 Pro

MMMU val. Table 5. Gemini 1.5 paper arxiv:2403.05530

62.2
+9.5%
Apr 2024
InternVL2-76B

MMMU val. InternVL2-76B. Table 10. arxiv:2404.16821

67.4
+8.4%
Oct 2024
GPT-4o

MMMU val. GPT-4o system card Table 1. arxiv:2410.21276

69.1
+2.5%
Jan 2025
InternVL3-78BCurrent SOTA

MMMU val. InternVL3-78B. Table 2. arxiv:2501.12891

73.3
+6.1%
Total Improvement
29.0%
Time Span
1y 10m
Breakthroughs
5
Current SOTA
73.3

Top Models Performance Comparison

Top 10 models ranked by accuracy

accuracy1InternVL3-78B73.3100.0%2Gemini 2.0 Flash71.998.1%3Qwen2.5-VL 72B70.295.8%4GPT-4o69.194.3%5Claude 3.5 Sonnet68.393.2%6InternVL2-76B67.492.0%7Qwen2-VL 72B64.588.0%8Gemini 1.5 Pro62.284.9%9Llama 3.2 Vision 90B60.382.3%10Claude 3 Opus59.481.0%0%25%50%75%100%% of best
Best Score
73.3
Top Model
InternVL3-78B
Models Compared
10
Score Range
13.9

accuracyPrimary

#ModelScorePaper / CodeDate
1
InternVL3-78BOpen Source
Shanghai AI Lab
73.3Jan 2025
2
Gemini 2.0 FlashAPI
Google
71.9Jan 2025
3
Qwen2.5-VL 72BOpen Source
Alibaba
70.2Feb 2025
4
GPT-4oAPI
OpenAI
69.1Oct 2024
5
Claude 3.5 SonnetAPI
Anthropic
68.3Oct 2024
6
InternVL2-76BOpen Source
Shanghai AI Lab
67.4Apr 2024
7
Qwen2-VL 72BOpen Source
Alibaba
64.5Sep 2024
8
Gemini 1.5 ProAPI
Google
62.2Feb 2024
9
Llama 3.2 Vision 90BOpen Source
Meta
60.3Jul 2024
10
Claude 3 OpusAPI
Anthropic
59.4Mar 2024
11
GPT-4V
56.8Mar 2023

Related Papers11

Other Visual Question Answering Datasets