How It Works

The Data Flywheel

CodeSOTA starts where Papers with Code left off. We're building the definitive ML benchmark resource through community contributions and verified results.

Seed Data

PWC's 79,817 papers and 9,327 benchmarks as our foundation. Seven years of ML research, preserved and indexed.

Fresh Verification

We run benchmarks ourselves. Flag stale claims. Update results that papers got wrong. No more trusting 3-year-old numbers.

Community Submissions

Researchers submit new results. Vendors update their model scores. The community catches errors and fills gaps.

Continuous Improvement

Better data attracts more users. More users submit more results. The cycle accelerates. Quality compounds.

Seed->Verify->Submit->Improve->Repeat

Where We Are Now

1,519

PWC Results Indexed

146

Datasets Covered

16+

OCR Models Verified

Live now:

OCR benchmarks - 16+ models, 9 datasets
Browse system - PWC archive data integrated
SOTA timeline charts on dataset pages
Result submission forms on every benchmark
Open JSON API at /data/

Coming next:

Speech recognition benchmarks
Code generation leaderboards
Automated benchmark verification pipeline
Community voting on result accuracy
Contributor profiles and attribution

How to Contribute

Submit New Results

Got a new model result? Published a paper with benchmark scores? Every dataset page has a submission form at the bottom. We review submissions and add verified results to the leaderboards.

Browse datasets to submit results

Report Issues

Found an incorrect score? Spotted a broken paper link? Results that don't match the source? Use the "Report Issue" button on any dataset page. Accurate data is the foundation.

Report an issue

Build on the Data

All our data is open. The PWC archive, our benchmarks, everything is JSON you can fetch and use. Build analysis tools, research dashboards, or integrate into your workflow.

/data/pwc-archive.json/data/benchmarks.json/data/datasets.json

Spread the Word

The flywheel only works with community participation. Share CodeSOTA with your research group, link to it in your papers, mention it when discussing benchmarks. More users = better data for everyone.

Why This Matters

"When Meta shut down Papers with Code, it proved that critical research infrastructure can't depend on corporate goodwill. The ML community needs benchmark tracking that's community-owned and community-driven."

The old model (broken)

- Company acquires community resource
- Promises to keep it open "forever"
- Priorities change, resource gets shut down
- Community scrambles to preserve data

The new model (sustainable)

- Open data from day one
- Community contributions drive growth
- No single point of failure
- Value compounds over time

Built on Papers with Code

We're not starting from scratch. CodeSOTA is built on the preserved Papers with Code dataset - 79,817 papers, 9,327 benchmarks, and 5,628 datasets from seven years of ML research. This foundation lets us focus on what PWC couldn't do: continuous verification and community-driven updates.

What we inherited	What we're adding
Historical benchmark results	Fresh verification (Dec 2025)
Paper-to-code linkages	Working code verification
Static leaderboards	Community submissions + voting
Aggregated claims	We run benchmarks ourselves
Research-focused metrics	Practical recommendations

Join the Flywheel

Every submission improves the data. Every issue report increases accuracy. Every user makes the flywheel spin faster.

Browse Benchmarks Read the Story

← Back to Home Start Contributing →