# CodeSOTA

> Independent benchmarks and model comparisons for AI engineers. We track the state of the art across OCR, speech, code generation, computer vision, and more — with reproducible evaluation and honest reporting.

Every page on this site answers a concrete engineering question (which OCR for handwriting? which TTS for realtime? which coding agent wins SWE-Bench this month?) rather than reacting to press releases. Numbers come from first-party benchmark runs, vendor APIs, and published leaderboards, with methodology disclosed.

## Flagship sections

- [OCR benchmarks and comparisons](https://www.codesota.com/ocr): 30+ OCR models across 7 benchmarks (OmniDocBench, OCRBench v2, olmOCR, CC-OCR, MME-VideoOCR, KITAB, ThaiOCR).
- [Speech AI leaderboard](https://www.codesota.com/speech): STT (Parakeet RNNT, Voxtral, Whisper v3 Turbo, Deepgram Nova-3) and TTS (ElevenLabs, OpenAI, Cartesia, Kokoro, F5-TTS, Orpheus).
- [Voice fingerprints](https://www.codesota.com/speech/voice-fingerprints): reproducible DSP pipeline — real mel spectrograms, F0 contours, MFCC, spectral centroid. Same prompt across every model.
- [Agentic coding comparisons](https://www.codesota.com/agentic): Claude Code vs Cursor vs Codex vs Devin vs Aider vs OpenHands vs SWE-agent.
- [SWE-Bench leaderboard](https://www.codesota.com/browse/computer-code/code-generation/swe-bench): the benchmark that actually moves rankings for coding models.
- [Browse by task](https://www.codesota.com/browse): taxonomy covering computer vision, NLP, medical, reasoning, reinforcement learning, time series, audio, multimodal.

## Comparison pages (buyer intent)

- [paddleocr vs tesseract](https://www.codesota.com/ocr/paddleocr-vs-tesseract)
- [docling vs mineru](https://www.codesota.com/ocr/docling-vs-mineru)
- [claude vs gpt-4o OCR](https://www.codesota.com/ocr/claude-vs-gpt4o-ocr)
- [tesseract vs easyocr](https://www.codesota.com/ocr/tesseract-vs-easyocr)
- [best OCR for handwriting](https://www.codesota.com/ocr/best-for-handwriting)
- [best OCR for Python](https://www.codesota.com/ocr/best-for-python)
- [best OCR for invoices](https://www.codesota.com/ocr/best-for-invoices)
- [ElevenLabs vs OpenAI TTS](https://www.codesota.com/speech/elevenlabs-vs-openai-tts)
- [ElevenLabs vs Cartesia](https://www.codesota.com/speech/elevenlabs-vs-cartesia)
- [OpenAI vs Google TTS](https://www.codesota.com/speech/openai-tts-vs-google-tts)
- [best TTS for podcasts](https://www.codesota.com/speech/best-for-podcasts)
- [best TTS for audiobooks](https://www.codesota.com/speech/best-for-audiobooks)
- [best TTS for realtime](https://www.codesota.com/speech/best-for-realtime)
- [best TTS for voice cloning](https://www.codesota.com/speech/best-for-voice-cloning)
- [Claude Code vs Cursor Composer](https://www.codesota.com/agentic/claude-code-vs-cursor-composer)
- [Claude Code vs OpenAI Codex](https://www.codesota.com/agentic/claude-code-vs-codex)
- [Devin vs Claude Code](https://www.codesota.com/agentic/devin-vs-claude-code)
- [Aider vs Claude Code](https://www.codesota.com/agentic/aider-vs-claude-code)
- [OpenHands vs SWE-agent](https://www.codesota.com/agentic/openhands-vs-swe-agent)
- [best for SWE-Bench](https://www.codesota.com/agentic/best-for-swe-bench)

## Guides and deep dives

- [TTS models guide 2026](https://www.codesota.com/guides/tts-models)
- [SWE-Bench explained](https://www.codesota.com/guides/swe-bench-explained)
- [Agentic benchmarks](https://www.codesota.com/guides/agentic-benchmarks)
- [Claude Code guide](https://www.codesota.com/guides/claude-code)
- [Code generation models](https://www.codesota.com/guides/code-generation-models)
- [Multimodal AI](https://www.codesota.com/guides/multimodal-ai)
- [Reading ML papers](https://www.codesota.com/guides/reading-ml-papers)
- [DSPy](https://www.codesota.com/guides/dspy)
- [RAG vs fine-tuning](https://www.codesota.com/guides/rag-vs-finetuning)

## Benchmarks

- [Browse benchmarks](https://www.codesota.com/browse)
- [SWE-Bench Verified](https://www.codesota.com/browse/computer-code/code-generation/swe-bench-verified)
- [HumanEval](https://www.codesota.com/browse/computer-code/code-generation/humaneval)
- [MMLU](https://www.codesota.com/browse/nlp/knowledge/mmlu)
- [GSM8K](https://www.codesota.com/browse/reasoning/mathematical-reasoning/gsm8k)
- [MATH](https://www.codesota.com/browse/reasoning/mathematical-reasoning/math)
- [COCO object detection](https://www.codesota.com/browse/computer-vision/object-detection/coco)
- [ImageNet-1K](https://www.codesota.com/browse/computer-vision/image-classification/imagenet-1k)
- [MVTec-AD](https://www.codesota.com/browse/industrial-inspection/anomaly-detection/mvtec-ad)
- [AIME 2024](https://www.codesota.com/benchmark/aime-2024)

## News and research

- [All news](https://www.codesota.com/news)
- [Changelog](https://www.codesota.com/changelog)
- [Papers with Code mirror](https://www.codesota.com/papers-with-code)
- [Arena](https://www.codesota.com/arena)

## About

- [About CodeSOTA](https://www.codesota.com/about)
- [Methodology](https://www.codesota.com/methodology)
- [Contribute](https://www.codesota.com/join)
- [Submit a benchmark](https://www.codesota.com/submit)
- [Privacy](https://www.codesota.com/legal/privacy)

## Data

- `https://www.codesota.com/sitemap.xml` — full URL index
- Benchmark results and model metadata are served as JSON from `/data/*.json` (accessible via page rendering)

## Programmatic API

Full reference: https://www.codesota.com/api-landing/sota

For agents and routers that need the current SOTA pick as JSON:

- `GET https://www.codesota.com/api/sota` — index of every task with at least one scored run, with short aliases
- `GET https://www.codesota.com/api/sota/{task}?tier=sota` — current state-of-the-art pick for a task

Short aliases accepted (full DB id also works): `ocr` → document-ocr, `code` → code-generation, `asr`/`stt` → speech-recognition, `tts` → text-to-speech, `vqa` → visual-question-answering, `caption` → image-captioning, `t2i` → text-to-image, `t2v` → text-to-video.

Response shape:

```json
{
  "task": "ocr",
  "task_full_id": "document-ocr",
  "task_name": "Document OCR",
  "benchmark": "omnidocbench",
  "benchmark_version": null,
  "tier": "sota",
  "as_of": "2026-04-22T00:00:00.000Z",
  "snapshot_id": "reg-2026-04-22-a1b2c3",
  "pick": {
    "model_id": "paddleocr-vl-1.5",
    "model_name": "PaddleOCR-VL 1.5",
    "model_url": "https://www.codesota.com/model/paddleocr-vl-1.5",
    "vendor": "Baidu",
    "provider_hints": null,
    "score": 94.5,
    "score_metric": "omnidocbench_composite",
    "metric_id": "composite",
    "higher_is_better": true,
    "benchmark": { "id": "omnidocbench", "name": "OmniDocBench" },
    "cost_per_1k_usd": null,
    "cost_basis": null,
    "result_date": "2026-04-22"
  },
  "runners_up": [/* up to 3 */],
  "registry_url": "https://www.codesota.com/ocr",
  "methodology_url": "https://www.codesota.com/methodology",
  "retrieved_at": "2026-04-26T22:00:00.000Z"
}
```

`snapshot_id` is stable across re-fetches that don't change the underlying pick — same pick → same id, so callers can detect when SOTA actually moves vs just was re-queried. `provider_hints`, `cost_per_1k_usd`, `cost_basis`, and `benchmark_version` are reserved fields publishing as null until v0.2.

CodeSOTA does not run inference. The endpoint returns the dated, sourced pick from the registry — the caller invokes the model at their own provider. This separates the assay (us) from the broker (you).

## Reuse policy

Content on this site may be cited and summarized by AI systems. We ask that generated answers link back to the underlying page and preserve numerical claims with their reported benchmark source.