Multimodal capability benchmark for vision-language models, covering perception and reasoning abilities across multiple dimensions.
Accuracy is the reported evaluation metric for MMBench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better