Massive Multidiscipline Multimodal Understanding benchmark covering 11.5K multimodal questions across 183 subfields from college-level exams in Art, Business, Science, Health, Humanities, and Tech. Requires deep reasoning over images, diagrams, and text. 30 subjects per discipline. Tests multi-image understanding and expert-level domain knowledge. A key VLM reasoning benchmark since early 2024.
Accuracy is the reported evaluation metric for MMMU. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better