Codesota · Registry log9,082 rows · 7134 new this monthShowing 9
Editorial · Registry log
Every score we've added, in order.
The append-only public ledger of every benchmark result on Codesota. When a row was written, when the result itself is dated, who the model was, what value was claimed, and where the citation lives. New-SOTA rows are marked in colour; unverified rows still show, but labelled.
This is the audit trail. If a score is wrong, this is where the error will be visible; if a source is missing, this is where you'll see the gap.
2026-03-27 · 8 rows
- 03:17InternVL3-78BMMMU73.3%NEW SOTA+1.40no source· unverified· dated 2025-01-22
- 03:17Gemini 2.0 FlashMMMU71.9%NEW SOTA+1.70source ↗· verified· dated 2025-01-15
- 03:17Qwen2.5-VL 72BMMMU70.2%NEW SOTA+1.10source ↗· verified· dated 2025-02-19
- 03:17GPT-4oMMMU69.1%NEW SOTA+0.80source ↗· verified· dated 2024-10-25
- 03:17Claude 3.5 SonnetMMMU68.3%NEW SOTA+6.10source ↗· verified· dated 2024-10-22
- 03:17Gemini 1.5 ProMMMU62.2%NEW SOTA+2.80source ↗· verified· dated 2024-02-15
- 03:17Claude 3 OpusMMMU59.4%NEW SOTA+2.60source ↗· verified· dated 2024-03-04
- 03:17GPT-4VMMMU56.8%NEW SOTAfirst resultsource ↗· verified· dated 2023-03-15
Showing the 200 most-recent rows. To inspect a single dataset’s history, append ?dataset=ID (e.g. /log?dataset=mmmu). Delta compares each row to the prior-best value on the same dataset at the moment this row was added. Hidden datasets and hidden models are not shown.