Codesota · Multimodal · Video Understanding · Video-MMETasks/Multimodal/Video Understanding
Video Understanding · benchmark dataset · 2024 · EN

Video-MME.

Comprehensive video understanding across diverse video types

Submit a result
§ 01 · Leaderboard

Best published scores.

24 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.


Primary
accuracy · higher is better
accuracy· primary
24 rows
#ModelOrgSubmittedPaper / codeaccuracy
01Qwen3.6-27BApr 2026pwc-dump · code87.70
02Qwen3.5-397B-A17BOpenAlibabaFeb 2026pwc-dump · code87.50
03Kimi-K2.5OpenMoonshot.AIFeb 2026Kimi K2.5: Visual Agentic Intelligence · code87.40
04Qwen3.6-35B-A3BApr 2026pwc-dump · code86.60
05Gemini 2.5 ProJul 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning…84.30
06Qwen3.5-Omni-PlusApr 2026Qwen3.5-Omni Technical Report81.90
07Qwen3-VL-235B-A22B-InstructQwenNov 2025Qwen3-VL Technical Report · code79.20
08Qwen2.5-VL-72BFeb 2025Qwen2.5-VL Technical Report · code79.10
09Qwen3-VL-235B-A22B-ThinkingQwenNov 2025Qwen3-VL Technical Report · code79
10LongCat-Flash-OmniOct 2025LongCat-Flash-Omni Technical Report · code78.20
11SiLVRMay 2025SiLVR: A Simple Language-based Video Reasoning Framework · code77.70
12Gemini 2.5 FlashJul 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning…75.50
13Ovis2.5-9BAug 2025Ovis2.5 Technical Report · code72.80
14InternVL3-78BOpenShanghai AI LabApr 2025InternVL3: Exploring Advanced Training and Test-Time Rec… · code72.70
15AriaOct 2024Aria: An Open Multimodal Native Mixture-of-Experts Model · code72.10
16Kimi-VL-A3B-Thinking-2506Apr 2025Kimi-VL Technical Report · code71.90
17Qwen3-VL-8B-InstructQwenNov 2025Qwen3-VL Technical Report · code71.40
18Qwen2-VL 72BOpenAlibabaSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code71.20
19MiniCPM-o 4.5-InstructApr 2026MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal … · code70.40
20Kimi-VL-A3B-InstructApr 2025Kimi-VL Technical Report · code67.80
21VideoLLaMA3 7BJan 2025VideoLLaMA 3: Frontier Multimodal Foundation Models for … · code66.20
22Qwen2-VL 7BAlibabaSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code63.30
23LongVUOct 2024LongVU: Spatiotemporal Adaptive Compression for Long Vid… · code60.60
24Qwen2-VL-2BSep 2024Qwen2-VL: Enhancing Vision-Language Model's Perception o… · code55.60
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

7 steps
of state of the art.

Each row below marks a model that broke the previous record on accuracy. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · accuracy
  1. Sep 18, 2024Qwen2-VL 72BAlibaba71.20
  2. Oct 8, 2024Aria72.10
  3. Feb 19, 2025Qwen2.5-VL-72B79.10
  4. Jul 7, 2025Gemini 2.5 Pro84.30
  5. Feb 2, 2026Kimi-K2.5Moonshot.AI87.40
  6. Feb 16, 2026Qwen3.5-397B-A17BAlibaba87.50
  7. Apr 21, 2026Qwen3.6-27B87.70
Fig 3 · SOTA-setting models only. 7 entries span Sep 2024 Apr 2026.
§ 04 · Literature

15 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies