Codesota · Registry log9,082 rows · 7134 new this monthShowing 22
Editorial · Registry log

Every score we've added, in order.

The append-only public ledger of every benchmark result on Codesota. When a row was written, when the result itself is dated, who the model was, what value was claimed, and where the citation lives. New-SOTA rows are marked in colour; unverified rows still show, but labelled.

This is the audit trail. If a score is wrong, this is where the error will be visible; if a source is missing, this is where you'll see the gap.

Filters:New-SOTA onlyVerified onlyclear all
2026-04-23 · 10 rows
  1. 20:40Gemini 3 Pro PreviewLiveCodeBench91.7%NEW SOTA+6.70source ↗· verified· dated 2026-03-15
  2. 20:40Claude Opus 4.7SWE-Bench Verified87.6%NEW SOTA+6.70source ↗· verified· dated 2026-04-18
  3. 18:58Gemini 3.1 Pro PreviewMMMU-Pro82.0%NEW SOTAfirst resultsource ↗· verified· dated 2026-03-18
  4. 18:57Qwen3.6 PlusMMMU86.0%NEW SOTA+12.70source ↗· verified· dated 2026-03-15
  5. 10:51Claude Sonnet 5SWE-Bench82.1%NEW SOTAfirst resultsource ↗· verified· dated 2026-02-01
  6. 10:51SENetImageNet97.8%NEW SOTA+1.32source ↗· verified· dated 2017-01-01
  7. 10:51ResNet-152ImageNet96.4%NEW SOTA+3.13source ↗· verified· dated 2015-01-01
  8. 10:51GoogLeNetImageNet93.3%NEW SOTA+2.30source ↗· verified· dated 2014-01-01
  9. 10:51CoCa (ViT-G/14)ImageNet91.0%NEW SOTAfirst resultsource ↗· verified· dated 2022-05-01
  10. 10:51Vega v2 (6B)GLUE91.3%NEW SOTAfirst resultsource ↗· verified· dated 2022-10-01
2026-04-13 · 4 rows
  1. 23:16LlamaParse AgenticParseBench84.9%NEW SOTA+13.00source ↗· verified
  2. 23:16LlamaParse Cost EffectiveParseBench71.9%NEW SOTA+0.90source ↗· verified
  3. 23:16Gemini 3 FlashParseBench71.0%NEW SOTA+24.20source ↗· verified
  4. 23:16GPT-5-miniParseBench46.8%NEW SOTAfirst resultsource ↗· verified
2026-04-12 · 1 row
  1. 20:20Gemini 3 ProLiveCodeBench Pro2439.00NEW SOTAfirst resultsource ↗· verified
2026-04-09 · 7 rows
  1. 02:00GPT-2-Large (prefix-tuning)e2e71.7%NEW SOTA+0.30source ↗· verified· dated 2021-07-14
  2. 02:00GPT-2-Medium (prefix-tuning)e2e71.4%NEW SOTA+0.40source ↗· verified· dated 2021-07-14
  3. 02:00GPT-2-Medium (fine-tuning)e2e71.0%NEW SOTA+0.20source ↗· verified· dated 2021-07-14
  4. 01:58Oracle-BERT (HowSumm-Method)howsumm-method63.2%NEW SOTA+4.30source ↗· verified
  5. 01:58Oracle-BOW (HowSumm-Method)howsumm-method58.9%NEW SOTA+5.40source ↗· verified
  6. 01:57Oracle-BERThowsumm-step46.8%NEW SOTA+0.80source ↗· verified· dated 2021-10-07
  7. 01:57Oracle-BOWhowsumm-step46.0%NEW SOTA+6.40source ↗· verified· dated 2021-10-07
Showing the 200 most-recent rows. To inspect a single dataset’s history, append ?dataset=ID (e.g. /log?dataset=mmmu). Delta compares each row to the prior-best value on the same dataset at the moment this row was added. Hidden datasets and hidden models are not shown.