How It Works

The Data Flywheel

CodeSOTA starts where Papers with Code left off. We're building the definitive ML benchmark resource through community contributions and verified results.

1

Seed Data

PWC's 79,817 papers and 9,327 benchmarks as our foundation. Seven years of ML research, preserved and indexed.

2

Fresh Verification

We run benchmarks ourselves. Flag stale claims. Update results that papers got wrong. No more trusting 3-year-old numbers.

3

Community Submissions

Researchers submit new results. Vendors update their model scores. The community catches errors and fills gaps.

4

Continuous Improvement

Better data attracts more users. More users submit more results. The cycle accelerates. Quality compounds.

Seed->Verify->Submit->Improve->Repeat

Where We Are Now

1,519

PWC Results Indexed

146

Datasets Covered

16+

OCR Models Verified

Live now:

  • OCR benchmarks - 16+ models, 9 datasets
  • Browse system - PWC archive data integrated
  • SOTA timeline charts on dataset pages
  • Result submission forms on every benchmark
  • Open JSON API at /data/

Coming next:

  • Speech recognition benchmarks
  • Code generation leaderboards
  • Automated benchmark verification pipeline
  • Community voting on result accuracy
  • Contributor profiles and attribution

How to Contribute

Submit New Results

Got a new model result? Published a paper with benchmark scores? Every dataset page has a submission form at the bottom. We review submissions and add verified results to the leaderboards.

Browse datasets to submit results

Report Issues

Found an incorrect score? Spotted a broken paper link? Results that don't match the source? Use the "Report Issue" button on any dataset page. Accurate data is the foundation.

Report an issue

Build on the Data

All our data is open. The PWC archive, our benchmarks, everything is JSON you can fetch and use. Build analysis tools, research dashboards, or integrate into your workflow.

/data/pwc-archive.json/data/benchmarks.json/data/datasets.json

Spread the Word

The flywheel only works with community participation. Share CodeSOTA with your research group, link to it in your papers, mention it when discussing benchmarks. More users = better data for everyone.

Why This Matters

"When Meta shut down Papers with Code, it proved that critical research infrastructure can't depend on corporate goodwill. The ML community needs benchmark tracking that's community-owned and community-driven."

The old model (broken)

  • - Company acquires community resource
  • - Promises to keep it open "forever"
  • - Priorities change, resource gets shut down
  • - Community scrambles to preserve data

The new model (sustainable)

  • - Open data from day one
  • - Community contributions drive growth
  • - No single point of failure
  • - Value compounds over time

Built on Papers with Code

We're not starting from scratch. CodeSOTA is built on the preserved Papers with Code dataset - 79,817 papers, 9,327 benchmarks, and 5,628 datasets from seven years of ML research. This foundation lets us focus on what PWC couldn't do: continuous verification and community-driven updates.

What we inheritedWhat we're adding
Historical benchmark resultsFresh verification (Dec 2025)
Paper-to-code linkagesWorking code verification
Static leaderboardsCommunity submissions + voting
Aggregated claimsWe run benchmarks ourselves
Research-focused metricsPractical recommendations

Join the Flywheel

Every submission improves the data. Every issue report increases accuracy. Every user makes the flywheel spin faster.