Codesota · AboutA note from the editorIssue: April 22, 2026
Editorial · About

A registry, not a
leaderboard.

Codesota is an open, dated record of machine-learning benchmarks. It is maintained by one person, in public, without payment for listings. This page is the short statement of what it is, what it is not, and who keeps it.

Started 10 December 2025 after Meta retired Papers with Code. The goal is a calmer, stricter record — one that survives its authors.

§ 01 · Intent

Why the record matters.

In July 2025 Meta retired Papers with Code. A seven-year archive of leaderboards, submissions and citations went offline without notice, and the field was briefly reminded that benchmark infrastructure, however load-bearing it felt, had been sitting on a single corporate hosting decision the whole time. Codesota is the attempt to rebuild that infrastructure in a way that cannot be retired by a press release.

It is, plainly, a website and a JSON file. Each benchmark is a task-dataset-metric triple with a declared direction, a fixed split, a reproducibility package and a dated submission. Each row carries a verification tier — self-reported, community-reproduced, or Codesota-reproduced — and each score is stamped with the day it was run. The full standard is on the methodology page; this page is the editorial note behind it.

The philosophical commitment is modest: a registry, not a leaderboard. A leaderboard is a view of “who is on top right now”; a registry is the record that makes that view legible. When a model regresses between checkpoints, the preceding score stays visible so the regression itself is visible. When a score turns out to be wrong, the correction is visible too. Nothing is silently deleted.

The practical commitment is equally modest: no payment for listings, no sponsored rankings, no investor behind the curtain. If paid work ever exists — consulting, a commissioned evaluation — it is disclosed inline on the page where the result appears. The public registry is not for sale.

Codesota does not need to be large to be useful. It needs to be honest, dated, and available to anyone who wants the JSON.

§ 02 · Refusals

What Codesota
is not.

Worth stating plainly, since the category of “ML benchmarks” now contains a great many things that look similar and behave differently.

  • 01
    Not a startup.
    Codesota is not a company raising a round. It has no investors, no board, no growth targets. The registry is the point, not the vehicle.
  • 02
    Not a press-release aggregator.
    A new model announcement does not automatically become a row. A score becomes a row when it has a reproduction path and a dated submission — until then it is, at most, a claim.
  • 03
    Not an editorial board.
    There is one editor. Submissions do not go through a review panel; they go through a reproduction run. The methodology page documents the procedure in full.
  • 04
    Not a leaderboard theatre.
    We do not re-rank scores to flatter a vendor, re-run an evaluation until the number improves, or quietly delete rows that age badly. The record is append-only in spirit.
  • 05
    Not for sale.
    There is no paid placement, no sponsored ranking, no promoted row. If paid work ever happens — consulting, custom benchmarking — it is labelled on the page it touches and does not reorder the public registry.
  • 06
    Not a replacement for the paper.
    The registry is a record of results; the primary literature is still the primary literature. Every row links back to its source, and the source is what you cite.
§ 03 · Masthead

Who runs this.

Codesota is written and maintained by Kacper Wikiel, in Warsaw. The repository, the registry and the site are one codebase, public on GitHub.

Day-to-day work is writing editorial pages, running reproductions on submitted checkpoints, and curating the task taxonomy. Contributors who submit reproduction runs or corrections are credited on the row rather than abstracted into a masthead count.

The project is self-directed. It is not funded by a venture round or a grant. When paid work exists — consulting on benchmark design, commissioned evaluations, custom profiling of specific models — it is declared on the page it affects, and it does not change the ordering of the public registry. If that arrangement ever changes, the change will be announced on this page before any row is affected.

Editorial judgement — which benchmarks to include, how to describe them, how to contextualise a score — is, and remains, the responsibility of the editor. Dissent is welcome in GitHub issues; the record is public either way.

§ 04 · Contact

How to write in.

Four routes, in order of preference.

  1. 01
    Open a GitHub issue.
    Corrections, new benchmark proposals, methodology disputes. The repo is github.com/kwikiel/codesota. Public, so the discussion is preserved.
  2. 02
    Submit a result.
    If you have trained a model and have a reproducibility package ready, the submission form is the right door. We run it, publish the score, and date the row.
  3. 03
    Partnership or consulting.
    Benchmark-design work, custom evaluations, commissioned profiling — see /consulting. Listing placement is not part of the arrangement.
  4. 04
    Press / republication.
    Registry data is published under CC BY 4.0. Cite codesota.com with the snapshot date; the JSON at /data/benchmarks.json is the canonical form.
Related · Further reading

What to read next.

All routes verified live · April 2026