Visual Question Answering2023en

MMBench: Is Your Multi-modal Model an All-around Player?

Comprehensive multimodal model evaluation benchmark covering 20 ability dimensions including object recognition, attribute reasoning, spatial reasoning, commonsense, and more. Contains 3,000+ multiple-choice questions. Uses CircularEval strategy to avoid positional bias. Maintained by OpenGVLab, widely used for VLM evaluation in 2024-2025.

Samples:3,000
Metrics:accuracy
Paper / Website
Current State of the Art

Qwen2.5-VL 72B

Alibaba

90.5

accuracy

accuracy Progress Over Time

Showing 5 breakthroughs from Mar 2023 to Feb 2025

74.378.783.287.692.0Mar 2023Aug 2023Feb 2024Aug 2024Feb 2025accuracyDate

Key Milestones

Mar 2023
GPT-4V

MMBench EN test. GPT-4V. Reported in multiple comparison papers incl. InternVL2 Table 12.

75.8
Apr 2024
InternVL2-76B

MMBench EN test. InternVL2-76B. Table 12. arxiv:2404.16821

86.5
+14.1%
Sep 2024
Qwen2-VL 72B

MMBench EN test. Qwen2-VL 72B. Table 6. arxiv:2409.12191

88.0
+1.7%
Jan 2025
InternVL3-78B

MMBench EN test. InternVL3-78B. Table 2. arxiv:2501.12891

90.1
+2.4%
Feb 2025
Qwen2.5-VL 72BCurrent SOTA

MMBench EN test. Qwen2.5-VL 72B. Table 2. arxiv:2502.13923

90.5
+0.4%
Total Improvement
19.4%
Time Span
1y 11m
Breakthroughs
5
Current SOTA
90.5

Top Models Performance Comparison

Top 8 models ranked by accuracy

accuracy1Qwen2.5-VL 72B90.5100.0%2InternVL3-78B90.199.6%3Qwen2-VL 72B88.097.2%4InternVL2-76B86.595.6%5GPT-4o83.492.2%6GPT-4V75.883.8%7Gemini 1.5 Pro73.981.7%8LLaVA-1.567.774.8%0%25%50%75%100%% of best
Best Score
90.5
Top Model
Qwen2.5-VL 72B
Models Compared
8
Score Range
22.8

accuracyPrimary

#ModelScorePaper / CodeDate
1
Qwen2.5-VL 72BOpen Source
Alibaba
90.5Feb 2025
2
InternVL3-78BOpen Source
Shanghai AI Lab
90.1Jan 2025
3
Qwen2-VL 72BOpen Source
Alibaba
88Sep 2024
4
InternVL2-76BOpen Source
Shanghai AI Lab
86.5Apr 2024
5
GPT-4oAPI
OpenAI
83.4Oct 2024
6
GPT-4V
75.8Mar 2023
7
Gemini 1.5 ProAPI
Google
73.9Feb 2024
8
LLaVA-1.5Open Source
UW-Madison / Microsoft
67.7Oct 2023

Related Papers8

Other Visual Question Answering Datasets