Every score we've added, in order.

The append-only public ledger of every benchmark result on Codesota. When a row was written, when the result itself is dated, who the model was, what value was claimed, and where the citation lives. New-SOTA rows are marked in colour; unverified rows still show, but labelled.

This is the audit trail. If a score is wrong, this is where the error will be visible; if a source is missing, this is where you'll see the gap.

Filters:New-SOTA only Verified only dataset: mmmu ✕clear all

2026-05-20 · 12 rows

15:46VideoLLaMA3 2BMMMU45.3%-40.70source ↗· unverified
15:46Qwen2-VL 72BMMMU64.5%-21.50source ↗· unverified
15:46Qwen2-VL 7BMMMU54.1%-31.90source ↗· unverified
15:46Qwen2-VL-2BMMMU41.1%-44.90source ↗· unverified
15:46Gemini 2.5 FlashMMMU79.7%-6.30source ↗· unverified
15:46Gemini 2.5 ProMMMU82.0%-4.00source ↗· unverified
15:46Qwen3-Omni-30B-A3B-Base-202507MMMU59.3%-26.67source ↗· unverified
15:46Gemma 3 (27B, IT)MMMU64.9%-21.10source ↗· unverified
15:46BLIP3-o (8B)MMMU50.6%-35.40source ↗· unverified
15:46BAGEL (7B MoT)MMMU55.3%-30.70source ↗· unverified
15:46InternVL3-78BMMMU72.2%-13.80source ↗· unverified
15:45MiniMax-VL-01MMMU68.5%-17.50source ↗· unverified

2026-04-23 · 7 rows

18:57Qwen3.5-27BMMMU82.3%-3.70source ↗· verified· dated 2025-09-01
18:57Qwen3.5-122B-A10BMMMU83.9%-2.10source ↗· verified· dated 2025-09-01
18:57Qwen3.5-397B-A17BMMMU83.9%-2.10source ↗· verified· dated 2025-09-01
18:57GPT-5.1MMMU85.4%-0.60source ↗· verified· dated 2025-11-13
18:57GPT-5.1 InstantMMMU85.4%-0.60source ↗· verified· dated 2025-11-13
18:57GPT-5.1 ThinkingMMMU85.4%-0.60source ↗· verified· dated 2025-11-13
18:57Qwen3.6 PlusMMMU86.0%NEW SOTA+12.70source ↗· verified· dated 2026-03-15

2026-03-27 · 11 rows

03:17InternVL3-78BMMMU73.3%NEW SOTA+1.40no source· unverified· dated 2025-01-22
03:17Gemini 2.0 FlashMMMU71.9%NEW SOTA+1.70source ↗· verified· dated 2025-01-15
03:17Qwen2.5-VL 72BMMMU70.2%NEW SOTA+1.10source ↗· verified· dated 2025-02-19
03:17Qwen2-VL 72BMMMU64.5%-4.60source ↗· verified· dated 2024-09-18
03:17InternVL2-76BMMMU67.4%-1.70source ↗· verified· dated 2024-04-25
03:17GPT-4oMMMU69.1%NEW SOTA+0.80source ↗· verified· dated 2024-10-25
03:17Claude 3.5 SonnetMMMU68.3%NEW SOTA+6.10source ↗· verified· dated 2024-10-22
03:17Llama 3.2 Vision 90BMMMU60.3%-1.90source ↗· verified· dated 2024-07-31
03:17Gemini 1.5 ProMMMU62.2%NEW SOTA+2.80source ↗· verified· dated 2024-02-15
03:17Claude 3 OpusMMMU59.4%NEW SOTA+2.60source ↗· verified· dated 2024-03-04
03:17GPT-4VMMMU56.8%NEW SOTAfirst resultsource ↗· verified· dated 2023-03-15

Showing the 200 most-recent rows. To inspect a single dataset’s history, append ?dataset=ID (e.g. /log?dataset=mmmu). Delta compares each row to the prior-best value on the same dataset at the moment this row was added. Hidden datasets and hidden models are not shown.