Codesota · General · Vision-Language Models · MMETasks/General/Vision-Language Models
Vision-Language Models · benchmark dataset · EN

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.

MME is a comprehensive evaluation benchmark for Multimodal Large Language Models (MLLMs) that assesses both perception and cognition abilities across 14 subtasks. The benchmark features manually designed instruction-answer pairs to prevent data leakage and uses concise instruction design to facilitate fair comparisons among MLLMs. Over 50 advanced MLLMs have been evaluated using MME, providing quantitative analysis and highlighting areas for improvement in multimodal model development.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies