Every score we've added, in order.

The append-only public ledger of every benchmark result on Codesota. When a row was written, when the result itself is dated, who the model was, what value was claimed, and where the citation lives. New-SOTA rows are marked in colour; unverified rows still show, but labelled.

This is the audit trail. If a score is wrong, this is where the error will be visible; if a source is missing, this is where you'll see the gap.

Filters:New-SOTA only Verified only clear all

2026-05-26 · 1 row

13:27Gemini 3.1 ProLiveCodeBench Pro2887.00NEW SOTA+448.00source ↗· verified

2026-05-20 · 1 row

09:38internlm2-1_8bOpen PL LLM Leaderboard60296.30NEW SOTA+44154.53source ↗· verified

Showing the 200 most-recent rows. To inspect a single dataset’s history, append ?dataset=ID (e.g. /log?dataset=mmmu). Delta compares each row to the prior-best value on the same dataset at the moment this row was added. Hidden datasets and hidden models are not shown.