MME is a comprehensive evaluation benchmark for Multimodal Large Language Models (MLLMs) that assesses both perception and cognition abilities across 14 subtasks. The benchmark features manually designed instruction-answer pairs to prevent data leakage and uses concise instruction design to facilitate fair comparisons among MLLMs. Over 50 advanced MLLMs have been evaluated using MME, providing quantitative analysis and highlighting areas for improvement in multimodal model development.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.