Why these numbers can be trusted.
Most leaderboards are a ledger of claims. Authors submit a number, a banner appears; the number stands until the next banner appears. No reviewer, no reproduction, no retraction. Codesota is different in three ordinary ways.
First, every submission carries code. Not a repository link alone — a frozen commit, a declared environment, a recorded seed. If the code does not run, the row does not publish.
Second, every benchmark has a metric direction. It sounds trivial. It is not. Half the confusion in the field comes from tables that do not say whether higher is better; ours do.
Third, every score carries a date and survives its author. When a model regresses — and they do regress — the record is preserved. The table never silently forgets.
We are building the registry we wish we had when we were training the models ourselves.