Independent ML Benchmark Tracking
State of the Art,
Verified.
17 research areas. 286+ benchmarks. Every result linked to its paper and code. No marketing claims — just data.
Research Areas
Browse benchmarks by domain
Computer Vision
10 tasksDetection, segmentation, classification, OCR
NLP
9 tasksLanguage models, QA, translation, NER
Reasoning
MATH, GSM8KMathematical, logical, commonsense
Code
6 tasksGeneration, SWE-bench, debugging
Speech
5 tasksASR, TTS, speaker verification
Medical
4 tasksImaging, diagnosis, clinical NLP
Multimodal
5 tasksVision-language, VQA, text-to-image
Agentic AI
5 tasksAutonomous agents, HCAST, time horizon
Featured
OCR Benchmarks
164+ models compared across OmniDocBench, OCRBench v2, and olmOCR. Open source vs vendor APIs.
SWE-bench SOTA
Which AI agents actually solve real GitHub issues? Latest verified scores.
Text Embeddings (MTEB)
Deep dive into the MTEB leaderboard. KaLM, Qwen3, Gemini, OpenAI compared across 8 task categories.
AI Building Blocks
Input-to-output transformations. Find the right model for your task.
AI Arena Rankings
Live human preference rankings across text, code, vision, search, and image generation.
Academy
Learn ML by reproducing real benchmarks. Theory → reproduce → improve → leaderboard.
Why this exists
When Meta shut down Papers with Code in July 2025, the ML community lost its reference for what state-of-the-art looks like. 9,327 benchmarks, 79,817 papers, gone overnight.
CodeSOTA rebuilds that infrastructure — independently. We verify results ourselves where possible, link every claim to its source paper and code, and publish all data as open JSON. No corporate owner that might pull the plug.
"Outstanding work. Just yesterday I was searching for good OCR comparisons and found only marketing BS. Good job!"
AI Consultant — Voice-AI at scale
"Super clean, slop-free UI, but most importantly the copy: very precise positioning and project overview."
Senior Architect
Open data
All benchmark data available as JSON. No API key, no rate limits. Build dashboards, cite in papers, integrate into your tools.
Cite CodeSOTA
@misc{wikiel2025codesota,
author = {Wikieł, Kacper},
title = {CodeSOTA: Independent ML Benchmark Tracking},
year = {2025},
url = {https://codesota.com},
note = {Accessed: 2025}
}FAQ
What is CodeSOTA?
An independent ML benchmark tracking platform. Verified state-of-the-art results across 17 research areas — vision, NLP, reasoning, code, speech, medical AI, robotics, and more.
Is this a Papers with Code replacement?
We build on the PWC legacy after Meta shut it down in July 2025. 286+ benchmarks with links to implementations. Full story.
Are these benchmarks verified?
Yes. We run benchmarks independently where possible. All data includes source URLs and access dates. See methodology.
Can I use this data?
Yes. All data is open JSON at /data/benchmarks.json. Build dashboards, cite in papers, or integrate into your tools.
Stay current
New benchmarks and model comparisons, delivered occasionally.
No spam. Unsubscribe anytime.