Who leads the MMBench benchmark?

SenseNova-U1-A3B-MoT currently leads MMBench with a score of 91.59 on Accuracy.

What is the state-of-the-art score on MMBench?

The state-of-the-art result on MMBench is 91.59 (Accuracy), achieved by SenseNova-U1-A3B-MoT as of 2026.

How many models are tracked on MMBench?

Codesota tracks 19 models on MMBench.

When was the MMBench leaderboard last updated?

The MMBench leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2023.

Codesota · Benchmark · MMBenchHome/Leaderboards/Multimodal Media/Visual Question Answering/MMBench

Unknown

MMBench.

Name: MMBench Benchmark Results
Creator: Unknown
Published: 2023-01-01
License: https://creativecommons.org/licenses/by/4.0/

Multimodal capability benchmark for vision-language models, covering perception and reasoning abilities across multiple dimensions.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Accuracy

Accuracy is the reported evaluation metric for MMBench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	SenseNova-U1-A3B-MoT	unverified	91.59	2026	Paper ↗Code ↗	Looks wrong?
02	Qwen2.5-VL 72B MMBench EN test. Qwen2.5-VL 72B. Table 2. arxiv:2502.13923	verified	90.5	2026	Source ↗	Looks wrong?
03	InternVL3-78B MMBench EN test. InternVL3-78B. Table 2. arxiv:2501.12891	verified	90.1	2025	Paper ↗	Looks wrong?
04	LongCat-Flash-Omni	unverified	89.8	2025	Paper ↗Code ↗	Looks wrong?
05	Qwen3-VL-235B-A22B-Instruct	unverified	89.3	2025	Paper ↗Code ↗	Looks wrong?
06	Qwen3-VL-235B-A22B-Thinking	unverified	88.8	2025	Paper ↗Code ↗	Looks wrong?
07	Qwen2.5-VL-72B	unverified	88.6	2025	Paper ↗Code ↗	Looks wrong?
08	Qwen2-VL 72B MMBench EN test. Qwen2-VL 72B. Table 6. arxiv:2409.12191	verified	88	2024	Paper ↗	Looks wrong?
09	Infinity-Parser2-Pro	unverified	87.54	2026	Paper ↗	Looks wrong?
10	InternVL2-76B MMBench EN test. InternVL2-76B. Table 12. arxiv:2404.16821	verified	86.5	2024	Paper ↗	Looks wrong?
11	BAGEL (7B MoT)	unverified	85	2025	Paper ↗Code ↗	Looks wrong?
12	GPT-4o MMBench EN test. GPT-4o. System card Table 1. arxiv:2410.21276	verified	83.4	2026	Source ↗	Looks wrong?
13	MiniCPM-V 4.6-Thinking (16x)	unverified	83.1	2026	Paper ↗	Looks wrong?
14	Qwen2-VL 7B	unverified	83	2024	Paper ↗Code ↗	Looks wrong?
15	MiniCPM-Llama3-V 2.5	unverified	77.2	2024	Paper ↗Code ↗	Looks wrong?
16	GPT-4V MMBench EN test. GPT-4V. Reported in multiple comparison papers incl. InternVL2 Table 12.	verified	75.8	2026	Source ↗	Looks wrong?
17	Qwen2-VL-2B	unverified	74.9	2024	Paper ↗Code ↗	Looks wrong?
18	Gemini 1.5 Pro MMBench EN dev. Gemini 1.5 Pro. Table 5. arxiv:2403.05530	verified	73.9	2026	Source ↗	Looks wrong?
19	LLaVA-1.5 MMBench EN dev. 13B. Table 1. arxiv:2310.03744	verified	67.7	2023	Paper ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Visual Question Answering