Visual Question Answering2023en
MMBench: Is Your Multi-modal Model an All-around Player?
Comprehensive multimodal model evaluation benchmark covering 20 ability dimensions including object recognition, attribute reasoning, spatial reasoning, commonsense, and more. Contains 3,000+ multiple-choice questions. Uses CircularEval strategy to avoid positional bias. Maintained by OpenGVLab, widely used for VLM evaluation in 2024-2025.
Current State of the Art
Qwen2.5-VL 72B
Alibaba
90.5
accuracy
accuracy Progress Over Time
Showing 5 breakthroughs from Mar 2023 to Feb 2025
Key Milestones
Mar 2023
GPT-4V
MMBench EN test. GPT-4V. Reported in multiple comparison papers incl. InternVL2 Table 12.
75.8
Feb 2025
Qwen2.5-VL 72BCurrent SOTA
MMBench EN test. Qwen2.5-VL 72B. Table 2. arxiv:2502.13923
90.5
+0.4%
Total Improvement
19.4%
Time Span
1y 11m
Breakthroughs
5
Current SOTA
90.5
Top Models Performance Comparison
Top 8 models ranked by accuracy
Best Score
90.5
Top Model
Qwen2.5-VL 72B
Models Compared
8
Score Range
22.8
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Qwen2.5-VL 72BOpen Source Alibaba | 90.5 | Feb 2025 | |
| 2 | InternVL3-78BOpen Source Shanghai AI Lab | 90.1 | Jan 2025 | |
| 3 | Qwen2-VL 72BOpen Source Alibaba | 88 | Sep 2024 | |
| 4 | InternVL2-76BOpen Source Shanghai AI Lab | 86.5 | Apr 2024 | |
| 5 | GPT-4oAPI OpenAI | 83.4 | Oct 2024 | |
| 6 | GPT-4V | 75.8 | Mar 2023 | |
| 7 | Gemini 1.5 ProAPI Google | 73.9 | Feb 2024 | |
| 8 | LLaVA-1.5Open Source UW-Madison / Microsoft | 67.7 | Oct 2023 |
Related Papers8
Qwen2.5-VL Technical Report
Feb 2025Models: Qwen2.5-VL 72B
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jan 2025Models: InternVL3-78B
SWE-bench Verified
Oct 2024Models: GPT-4o
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Sep 2024Models: Qwen2-VL 72B
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Apr 2024Models: InternVL2-76B
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Feb 2024Models: Gemini 1.5 Pro
Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)
Oct 2023Models: LLaVA-1.5
GPT-4 Technical Report
Mar 2023Models: GPT-4V