How it works
The Data Flywheel
CodeSOTA starts where Papers with Code left off. Community contributions and verified results, compounding over time.
Traction
Last 12 months. Analytics via Vercel.
The cycle
Seed data
79,817 papers and 9,327 benchmarks from Papers with Code. Seven years of ML research, preserved.
Fresh verification
We run benchmarks ourselves. Flag stale claims. Update results that papers got wrong.
Community submissions
Researchers submit new results. Vendors update scores. The community catches errors and fills gaps.
Continuous improvement
Better data attracts more users. More users submit more results. Quality compounds.
Where we are now
Live now
- OCR benchmarks — 16+ models, 9 datasets
- Browse system — PWC archive integrated
- SOTA timeline charts on dataset pages
- Open JSON API at
/data/
Coming next
- Speech recognition benchmarks
- Code generation leaderboards
- Automated benchmark verification
- Community voting on accuracy
- Contributor profiles
How to contribute
Submit new results
Every dataset page has a submission form. We review submissions and add verified results to the leaderboards.
Report issues
Incorrect score? Broken link? Results that don't match the source? Use the report button on any dataset page. Accurate data is the foundation.
Build on the data
All data is open JSON. Build dashboards, cite in papers, integrate into tools.
/data/benchmarks.json/data/datasets.jsonWhy this matters
When Meta shut down Papers with Code, it proved that critical research infrastructure can't depend on corporate goodwill.
The old model
- Company acquires community resource
- Promises to keep it open
- Priorities change, resource dies
- Community scrambles to preserve data
The new model
- Open data from day one
- Community contributions drive growth
- No single point of failure
- Value compounds over time
Built on Papers with Code
79,817 papers, 9,327 benchmarks, 5,628 datasets from seven years of ML research. This foundation lets us focus on what PWC couldn't: continuous verification and community-driven updates.
| Inherited | Adding |
|---|---|
| Historical benchmark results | Fresh verification (2026) |
| Paper-to-code linkages | Working code verification |
| Static leaderboards | Community submissions + voting |
| Aggregated claims | We run benchmarks ourselves |
| Research-focused metrics | Practical recommendations |
Join the flywheel
Every submission improves the data. Every user makes the flywheel spin faster.