Who leads the MMBench benchmark?

SenseNova-U1-A3B-MoT currently leads MMBench with a score of 91.59 on accuracy.

What is the state-of-the-art score on MMBench?

The state-of-the-art result on MMBench is 91.59 (accuracy), achieved by SenseNova-U1-A3B-MoT as of 2026.

How many models are tracked on MMBench?

Codesota tracks 19 models on MMBench.

When was the MMBench leaderboard last updated?

The MMBench leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2023.

Codesota · Multimodal · Visual Question Answering · MMBenchTasks/Multimodal/Visual Question Answering

Visual Question Answering · benchmark dataset · 2023 · EN

MMBench: Is Your Multi-modal Model an All-around Player?.

Name: MMBench: Is Your Multi-modal Model an All-around Player? Benchmark Results
Creator: Codesota
Published: 2023-01-01
License: https://creativecommons.org/licenses/by/4.0/

Multimodal capability benchmark for vision-language models, covering perception and reasoning abilities across multiple dimensions.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

20 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.

Primary: accuracy · higher is better

accuracy· primary

20 rows

#	Model	Org	Submitted	Paper / code	accuracy
01	SenseNova-U1-A3B-MoT	SenseTime	May 2026	SenseNova-U1: Unifying Multimodal Understanding and Gene… · code	91.59
02	Qwen2.5-VL 72BOpen	Alibaba	Feb 2025	Qwen2.5-VL Technical Report	90.50
03	InternVL3-78BOpen	Shanghai AI Lab	Jan 2025	InternVL3: Exploring Advanced Training and Test-Time Rec…	90.10
04	LongCat-Flash-Omni	—	Oct 2025	LongCat-Flash-Omni Technical Report · code	89.80
05	Qwen3-VL-235B-A22B-Instruct	Qwen	Nov 2025	Qwen3-VL Technical Report · code	89.30
06	Qwen3-VL-235B-A22B-Thinking	Qwen	Nov 2025	Qwen3-VL Technical Report · code	88.80
07	Qwen2.5-VL-72B	—	Feb 2025	Qwen2.5-VL Technical Report · code	88.60
08	Qwen2-VL 72BOpen	Alibaba	Sep 2024	Qwen2-VL: Enhancing Vision-Language Model's Perception o…	88
09	Infinity-Parser2-Pro	—	May 2026	pwc-dump	87.54
10	Qwen2-VL 72BOpen	Alibaba	Sep 2024	Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code	86.50
11	InternVL2-76BOpen	Shanghai AI Lab	Apr 2024	InternVL: Scaling up Vision Foundation Models and Aligni…	86.50
12	BAGEL (7B MoT)	—	May 2025	Emerging Properties in Unified Multimodal Pretraining · code	85
13	GPT-4oAPI	OpenAI	Oct 2024	SWE-bench Verified	83.40
14	MiniCPM-V 4.6-Thinking (16x)	—	May 2026	pwc-dump	83.10
15	Qwen2-VL 7B	Alibaba	Sep 2024	Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code	83
16	MiniCPM-Llama3-V 2.5	—	Aug 2024	MiniCPM-V: A GPT-4V Level MLLM on Your Phone · code	77.20
17	GPT-4V	—	Mar 2023	GPT-4 Technical Report	75.80
18	Qwen2-VL-2B	—	Sep 2024	Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code	74.90
19	Gemini 1.5 ProAPI	Google	Feb 2024	Gemini 1.5: Unlocking multimodal understanding across mi…	73.90
20	LLaVA-1.5Open	UW-Madison / Microsoft	Oct 2023	Improved Baselines with Visual Instruction Tuning (LLaVA…	67.70

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

6 steps
of state of the art.

Each row below marks a model that broke the previous record on accuracy. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · accuracy

Mar 15, 2023GPT-4V75.80
Apr 25, 2024InternVL2-76BShanghai AI Lab86.50
Sep 18, 2024Qwen2-VL 72BAlibaba88
Jan 22, 2025InternVL3-78BShanghai AI Lab90.10
Feb 19, 2025Qwen2.5-VL 72BAlibaba90.50
May 12, 2026SenseNova-U1-A3B-MoTSenseTime91.59

Fig 3 · SOTA-setting models only. 6 entries span Mar 2023 → May 2026.

§ 04 · Literature

13 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
May 2026·SenseNova-U1-A3B-MoT
arXiv ↗Code
Qwen3-VL Technical Report
Nov 2025·Qwen3-VL-235B-A22B-Instruct, Qwen3-VL-235B-A22B-Thinking
arXiv ↗Code
LongCat-Flash-Omni Technical Report
Oct 2025·LongCat-Flash-Omni
arXiv ↗Code
Emerging Properties in Unified Multimodal Pretraining
May 2025·BAGEL (7B MoT)
arXiv ↗Code
Qwen2.5-VL Technical Report
Feb 2025·Qwen2.5-VL 72B, Qwen2.5-VL-72B
arXiv ↗
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jan 2025·InternVL3-78B
arXiv ↗
SWE-bench Verified
Oct 2024·GPT-4o
arXiv ↗
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Sep 2024·Qwen2-VL 72B, Qwen2-VL 7B, Qwen2-VL-2B
arXiv ↗
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Aug 2024·MiniCPM-Llama3-V 2.5
arXiv ↗Code
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Apr 2024·InternVL2-76B
arXiv ↗
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Feb 2024·Gemini 1.5 Pro
arXiv ↗
Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)
Oct 2023·LLaVA-1.5
arXiv ↗
GPT-4 Technical Report
Mar 2023·GPT-4V
arXiv ↗

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

MMBench: Is Your Multi-modal Model an All-around Player?.

Best published scores.

6 stepsof state of the art.

13 paperstied to this benchmark.

Neighbouring benchmarks.

Have a score that beatsthis table?

6 steps
of state of the art.

13 papers
tied to this benchmark.

Have a score that beats
this table?