Changelog
Latest updates, benchmarks, and documentation releases.
Get updates in your inbox
New benchmarks, tools, and guides. No spam.
11 Interactive Paradox Explainers
- -11 3Blue1Brown-style interactive explainers covering mathematical paradoxes
- -Stein's Paradox: How wheat prices help predict baseball averages
- -Will Rogers Phenomenon: Stage migration in cancer survival statistics
- -Berkson's Paradox: Selection bias in dating and hospitals
- -Low Birth Weight Paradox: Simpson's paradox in epidemiology
- -Schelling's Segregation: Agent-based model with smart movement algorithm
- -Ross-Littlewood Paradox: Infinite sets and supertasks
- -Banach-Tarski Paradox: Measure theory and the axiom of choice
- -Newcomb's Paradox: Decision theory and free will
- -Arrow's Impossibility Theorem: Why fair voting is mathematically impossible
- -Cobra Effect: Incentive design and Goodhart's Law
- -Grossman-Stiglitz Paradox: Why efficient markets can't exist
Massive expansion of the Explainers section with 11 comprehensive interactive paradox explainers. Each features multiple interactive simulations, games, and visualizations that let users experience the paradox firsthand. Schelling's Segregation includes smart agent movement for better convergence at high thresholds.
Rys OCR: Polish SOTA OCR Model (Research Preview)
- -First fine-tune of Polish OCR model released on HuggingFace
- -71.3% Character Error Rate (CER) reduction on Polish text
- -46.1% Word Error Rate (WER) reduction
- -LoRA fine-tune on PaddleOCR-VL base model
- -Optimized for Polish diacritics: a, c, e, l, n, o, s, z, z
- -Runs on consumer hardware (4-6 GB VRAM)
- -Apache 2.0 license, fully open source
- -Call for contributions: datasets, benchmarks, R&D collaboration
Rys OCR is the first release in ongoing R&D to build state-of-the-art Polish text recognition. Trained on 10,000 synthetic Polish document images (addresses, invoices, receipts, dates, names). Looking for contributors to help with real Polish datasets, benchmark evaluations, and model improvements.
21 New 3Blue1Brown-Style Interactive Explainers
- -21 new explainer components with interactive visualizations
- -Face Anonymization: detection pipelines, blurring vs pixelation vs generative inpainting
- -PII Detection: entity types, confidence thresholds, redaction strategies
- -Text Reranking: bi-encoder vs cross-encoder, relevance scoring
- -Hallucination Detection: factual grounding, source verification methods
- -Hybrid Retrieval: BM25 + dense vector fusion, reciprocal rank fusion
- -Controllable Generation: temperature, top-k/p, repetition penalty, CFG
- -Chart Understanding: chart type detection, data extraction pipelines
- -Question Answering: extractive vs abstractive, span prediction
- -Long Context Summarization: chunking strategies, hierarchical approaches
- -Video-to-Text: frame sampling, temporal understanding, captioning
- -Code Generation: syntax-aware models, repair and completion
- -Audio/Video processing: emotion recognition, action recognition, tracking
- -50+ total building blocks now have interactive explainers
Massive expansion of Building Blocks with 21 new 3Blue1Brown-style interactive explainers. Each component features step-by-step visualizations, architecture diagrams, and practical code examples. Covers advanced AI capabilities from face anonymization to hallucination detection to video understanding.
Next.js 16 Migration & OCR Labeling Platform
- -Complete migration from Astro to Next.js 16.1.1 with App Router
- -New OCR Labeling Platform: Upload images, get bounding boxes via DOTS OCR (Replicate)
- -Human-in-the-loop data flywheel for document processing quality improvement
- -27+ interactive explainer components migrated to React (LLM, VLM, TTS, etc.)
- -CodeBlock component with Prism.js syntax highlighting and .ipynb download
- -All dynamic routes fixed for Next.js 15+ async params pattern
- -New /benchmark/[id] and /[area]/compare/[...slug] pages
- -TypeScript compilation verified across entire codebase
Major infrastructure release migrating from Astro to Next.js for better performance, SSR, and React ecosystem integration. The new OCR Labeling Platform enables community-driven quality improvement - upload documents, review AI-extracted text with bounding boxes, and submit corrections. Interactive Building Blocks explainers (LLMExplainer, ImageCaptioningExplainer) now use React with useState for full interactivity.
Comprehensive SOTA Editorials for Major AI Areas
- -Added expert editorials for 10 major AI areas with 500+ citations
- -Speech: Whisper, Conformer, XTTS voice cloning, sub-200ms TTS latency insights
- -NLP: GPT-5, Claude 3.5, DeepSeek-V3 comparisons, RAG adoption patterns
- -Computer Code: SWE-bench leaders, RLVR training paradigm, package hallucination risks
- -Reasoning: o3/o4-mini math performance, test-time compute scaling analysis
- -Multimodal: Open-source parity (InternVL3, Molmo 2), hallucination mitigation
- -Agentic AI: METR benchmarks, MCP/A2A protocols, production deployment reality
- -Audio: Suno v4.5 music generation, MSEB benchmark gaps, mHuBERT edge deployment
- -Robotics: OpenVLA 7B outperforming RT-2-X 55B, COLOSSEUM brittleness findings
- -Medical: GPT-4o USMLE 90.4%, FDA approval generalization gaps, BoltzGen drug discovery
- -500+ citations from peer-reviewed papers (NeurIPS, ICML, CVPR, ACL)
Major content release adding practitioner-focused editorials to all major AI areas. Each editorial includes: State of the Field with specific benchmark scores, Hot Takes with honest insights, and Production Recommendations for different use cases. Comprehensive research with 500+ citations from peer-reviewed papers and major conferences.
The Zen of AI Composition: Free PDF Released
- -Book now available for free download - no email required
- -Direct PDF download with download counter tracking
- -Building intelligent systems from first principles
- -Three parts: Nature of Composition, Transformations, Practice
"The Zen of AI Composition" is now available as a free PDF download. A philosophical guide to building intelligent systems - covering the history of AI transformations, modular composition, and evidence-based prompting techniques.
The Zen of AI Composition: Book Early Access
- -New book landing page with early access signup
- -Double opt-in email confirmation via Resend
- -Admin notifications on confirmed signups
- -Book covers AI composition from first principles
- -Three parts: Nature of Composition, Transformations, Practice
Announcing "The Zen of AI Composition" - a philosophical guide to building intelligent systems. Sign up for early access to receive the book first and help shape the final version. Double opt-in email flow ensures only engaged readers join the list.
Decision Tools: Model Comparator, Verification Protocol, Intent Analytics
- -Interactive Model Comparator: Select 2-4 OCR models for side-by-side comparison
- -Failure mode comparison: diacritics, tables, stamps, handwriting, low quality
- -Shareable comparison URLs with query params for team decisions
- -Verification Protocol page: 5-step benchmark verification process
- -VERIFIED badge schema: dataset hash, prompt/config, runtime, cost, metric code
- -Three verification tiers: Self-Reported, CodeSOTA Verified, Continuous Monitoring
- -Decision intent analytics: scroll depth, time on page, CTA clicks, outbound tracking
- -Atropos LLM RL guide: Nous Research framework for OCR evaluation and training
- -Standalone OCR evaluation script for testing vision models on OCR-VQA
Continues the strategic transformation with enterprise decision tools. The Model Comparator lets teams compare 10 OCR models across 8 metrics with failure mode analysis. Verification Protocol establishes trust through transparent methodology. Decision intent analytics track how users make choices, enabling continuous improvement of the decision platform.
OCR Decision Platform: From Catalog to Decision Engine
- -New canonical OCR Decision Guide page with failure taxonomy focus
- -Homepage transformed: OCR hero with 90-second clarity messaging
- -Failure taxonomy: diacritics, column bleed, numeric substitution, table collapse, stamp interference
- -Decision matrix: "If your priority is X, choose Y" format
- -Private OCR Evaluation Preview with waitlist signup
- -Independence & Conflict of Interest Policy on methodology page
- -GDPR compliance and EU data residency messaging
- -EvaluationCTA component added to all 5 comparison pages
- -Navigation updated: OCR highlighted first in cyan
Major strategic release transforming CodeSOTA from "benchmark catalog" to "decision platform". The new /ocr/decision page is the canonical OCR decision artifact - focusing on failure modes (what breaks) rather than accuracy percentages. Homepage now leads with OCR, includes 90-second clarity test (Who/What/Why/Next), and features Private Evaluation Preview. All comparison pages now include Request Evaluation CTAs.
Agentic AI Benchmarks: METR Time Horizon & Path to AGI
- -New Agentic AI page: METR benchmarks tracking autonomous AI capabilities
- -Time Horizon leaderboard: GPT-5.1-Codex-Max (160 min), GPT-5, o1-preview, Claude 3
- -HCAST, RE-Bench, SWAA task suite breakdowns
- -Interactive benchmark saturation chart (JS/Chart.js) with category views
- -Building blocks now connected to all /browse/[area] pages
- -27 benchmarks across 8 categories including new Agentic category
- -7-month doubling time trend analysis for AGI timeline
Major release focused on agentic AI capabilities - the key metric for AGI progress. Added comprehensive METR benchmark tracking including time horizon (how long AI can work autonomously) and HCAST scores. Interactive JS-based saturation charts show how different benchmark categories are approaching ceiling performance. Browse pages now show relevant building blocks for each research area.
Building Blocks: 6 More Interactive Explainers
- -Image Captioning: VLM deep-dive with LLaVA, Qwen2-VL, BLIP-2, GPT-4V architecture comparison
- -Text-to-Video: Sora, Runway Gen-3, CogVideoX, Diffusion Transformer (DiT) architecture
- -Image-to-Image: Inpainting, outpainting, super resolution, ControlNet, IP-Adapter
- -Text-to-3D: DreamFusion, Shap-E, MVDream, LGM, Score Distillation Sampling explained
- -Image-to-Video: Stable Video Diffusion, AnimateDiff, LivePortrait, Runway API
- -Depth Estimation: Enhanced with real example images (mountain, street, indoor, portrait)
- -Now 14+ comprehensive explainers covering all major AI modalities
Continued expansion of Building Blocks with 6 new interactive explainers. Each includes architecture diagrams, model evolution timelines, practical code examples, and performance comparisons. Depth estimation now features real input/output examples with turbo colormap visualization.
Building Blocks Expansion: 8 Interactive Explainers
- -Object Detection: YOLO evolution (v1-v11), NMS, two-stage vs single-stage, mAP metrics
- -Image Segmentation: SAM 2, semantic/instance/panoptic types, mask formats, Mask2Former
- -Depth Estimation: Depth Anything v2, ZoeDepth, Marigold, metric vs relative depth
- -Image to 3D: Gaussian Splatting, NeRF, Trellis, single-image 3D generation
- -Speech Recognition: Whisper deep-dive, turbo vs large-v3, faster-whisper, diarization
- -27+ building blocks covering vision, NLP, audio, video, and 3D modalities
- -Comprehensive code examples for each modality with multiple frameworks
- -Interactive visualizations: attention matrices, depth colormaps, architecture diagrams
Massive expansion of the Building Blocks section. Added 8 comprehensive interactive explainers covering the core AI modalities. Each explainer includes architecture deep-dives, model evolution timelines, practical code examples (YOLO, SAM, Whisper, etc.), and interactive visualizations. The goal: be the best resource for understanding how each AI capability actually works.
Modular Benchmark Runner & Mistral OCR 2512 Verification
- -New modular benchmark runner system with pluggable backends
- -Mistral OCR 2512 (Mistral 3 OCR) verified and tested
- -Stanford Churro (CHURRO-DS) benchmark integration
- -OCRBench v2 runner with official evaluation support
- -HTTP API daemon for remote GPU benchmark execution
- -Checkpoint-based resumable benchmark runs
- -Automated results sync to website data files
Major infrastructure release: The benchmark-runner now supports modular benchmark backends (Mistral OCR, Churro, OCRBench v2, OmniDocBench). Verified Mistral OCR 2512 performance: 9 pages in 7.37 seconds with high-quality markdown output. Stanford Churro integration enables historical document OCR benchmarks across 46 languages. All runners support checkpointing for resumable runs.
Interactive AI Explainers: LLM & TTS Deep Dives
- -LLM Explainer: How transformers work with 5 interactive sections
- -TTS Explainer: Complete text-to-speech pipeline visualization
- -Interactive tokenization demo with BPE explanation
- -Attention mechanism visualization with clickable matrix
- -Next token prediction with probability distributions
- -Mel spectrogram and waveform canvas visualizations
- -Voice cloning methods: speaker embedding, in-context learning, fine-tuning
- -Neural codec language model explanation (VALL-E, ElevenLabs-style)
Two comprehensive interactive explainers for the Building Blocks pages. The LLM explainer covers tokenization, embeddings, attention, next-token prediction, and transformer architecture. The TTS explainer covers text normalization, G2P, prosody, acoustic models, mel spectrograms, vocoders, and zero-shot voice cloning. All with interactive canvas-based visualizations.
Building Blocks & Editorial Guides
- -Building Blocks: Modular AI capabilities taxonomy (image-to-vector, text-to-vector, etc.)
- -Editorial Guides for 3 personas: Executives, Enthusiasts, Researchers
- -Executive Guide: Document Processing Technology Matrix with vendor comparison
- -Enthusiast Guide: SOTA Tracker with current leaders and reproduction tips
- -Research Guide: ML Landscape 2025 with trend analysis and gap identification
- -Data Flywheel page explaining community-driven benchmark growth
- -LLM and Object Detection hub pages
- -PWC Archive: 1,519 papers, 464 models, 145 datasets integrated
Major release introducing Building Blocks - a new way to think about AI capabilities as modular transformations (image-to-vector, audio-to-text, etc.). Added comprehensive editorial guides for different user personas: CTOs get vendor comparison matrices, enthusiasts get SOTA tracking tools, researchers get trend analysis. All built on Papers with Code archive data.
SEO Improvements & Production Auth
- -Papers with Code alternative page SEO optimization
- -Added FAQ section targeting "People Also Ask"
- -Clerk production authentication with GitHub OAuth
- -User work profile preferences in dashboard
- -Sitemap fixed with correct www domain
- -Removed custom analytics (using Vercel Analytics)
SEO improvements for the Papers with Code story page including optimized title, meta description, FAQ section, and internal links. Switched Clerk to production mode with GitHub OAuth. Added work profile preferences feature for logged-in users to describe their ML focus areas.
User Accounts & Email Capture
- -User authentication via Clerk (GitHub OAuth)
- -Protected dashboard for authenticated users
- -Sign-in and sign-up pages with dark theme styling
Added user account system using Clerk for authentication. Users can sign in with GitHub. Protected routes redirect unauthenticated users to sign-in.
CodeSOTA Polish OCR Benchmark
- -1,000 synthetic and real Polish text images with ground truth
- -4 categories: synth_random, synth_words, real_corpus, wikipedia
- -5 degradation levels: clean, light, medium, heavy, severe
- -Tesseract 5.5.1 baseline: 26.3% CER overall
- -Contamination-resistant design exposes LM dependence (52% vs 5% CER)
- -Dedicated Polish OCR page with category breakdown and key findings
Our own Polish OCR benchmark designed to detect language model reliance vs pure character recognition. Synthetic categories (no dictionary fallback) show 10x worse performance than real text, exposing heavy dependence on statistical language models. Dataset includes 5 degradation levels using Augraphy to simulate real document scanning conditions.
Mistral OCR 3 Added
- -New Mistral OCR 3 model (mistral-ocr-2512) added to benchmarks
- -Dedicated review page with pricing, code examples, benchmarks
- -94.9% claimed accuracy, 74% win rate over OCR 2
- -$2/1000 pages ($1 with batch API)
- -Benchmark results: olmOCR-bench, CER, WER metrics
Added comprehensive coverage of Mistral OCR 3 released December 2025. Includes benchmark comparisons with GPT-4o and PaddleOCR, pricing analysis, code examples, and use case recommendations.
Featured Guides on Landing Page
- -New "In-Depth Comparisons" section on homepage with image cards
- -6 featured editorial guides: OCR comparisons, Audio AI, Medical Radiology
- -Images surfaced for better Google Images discovery
- -Internal linking boost for deep content pages
Landing page now showcases editorialized content with images for better SEO discovery. Featured guides include PaddleOCR vs Tesseract, GPT-4o vs PaddleOCR, Best OCR for Invoices, Best OCR for Handwriting, Audio AI Benchmarks, and Chest X-ray AI Models.
Audio AI Benchmarks: Classification, Music Generation & More
- -Audio AI overview: AudioSet, ESC-50, music generation, audio captioning
- -Audio Classification deep-dive: BEATs at 0.498 mAP, 98.1% on ESC-50
- -Music Generation comparison: Suno, Udio, MusicGen, Stable Audio
- -7 custom visualizations: spectrograms, waveforms, model comparisons
- -Evaluation metrics explained: mAP, FAD, MOS, CLAP scores
- -Architecture breakdowns: CNN vs Vision Transformer vs Audio Tokenizer
Comprehensive Audio AI vertical with 3 editorial pages covering classification (AudioSet, ESC-50), music generation (Suno, Udio, MusicGen), and audio understanding (Qwen2-Audio, SALMONN). Includes custom-generated visualizations and practical model recommendations by use case.
GPU Hardware Benchmarks: RTX 3090 vs 4090 vs 5090
- -Compare RTX 3090, 4090, 5090 for ML workloads
- -LLM inference: Llama 3, Mistral, with tokens/sec metrics
- -Image generation: SDXL, Flux, SD 1.5 benchmarks
- -Training: LoRA fine-tuning, YOLO, ResNet performance
- -VRAM requirements guide: which models fit on which GPU
- -Cloud GPU pricing from RunPod, vast.ai, Lambda Labs
New Hardware section with comprehensive GPU comparison for ML. Includes specs, real-world benchmarks across LLM inference, image generation, training, and computer vision. Features recommendations on which GPU to buy and cloud pricing comparison.
Polish OCR Benchmarks
- -4 Polish OCR datasets: PolEval 2021, IMPACT-PSNC, reVISION, Polish EMNIST
- -Models: Tesseract Polish, ABBYY FineReader, HerBERT, Polish RoBERTa
- -Best CER: 2.1% on PolEval 2021, 97.5% word accuracy on IMPACT
- -Covers diacritics challenges and gothic font recognition
Dedicated Polish OCR benchmark page covering historical documents from 1791-1998, gothic fonts, and Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż). Features both raw OCR engines and NLP post-correction approaches.
Industrial Anomaly Detection Benchmarks
- -8 industrial datasets: MVTec AD, VisA, weld defects, steel defects
- -12 anomaly detection models: PatchCore, EfficientAD, SimpleNet, FastFlow
- -Best AUROC: 99.6% (SimpleNet on MVTec AD)
- -Covers weld inspection, steel defects, surface inspection
- -Three approaches: Memory Bank, Normalizing Flows, Student-Teacher
Industrial inspection vertical for manufacturing quality control. Covers anomaly detection for defects, weld X-ray inspection, and steel surface analysis. Includes metrics explainer for AUROC vs PRO.
Chest X-Ray AI: Radiology Benchmarks Launch
- -7 chest X-ray datasets: CheXpert, MIMIC-CXR, NIH ChestX-ray14, VinDr-CXR, PadChest, RSNA, COVID-19
- -15 radiology AI models: CheXNet, CheXzero, TorchXRayVision, MedCLIP, GLoRIA, BioViL
- -20+ benchmark results with AUC scores across datasets
- -Interactive cross-dataset comparison chart
- -Data pipeline explainer: DICOM to multi-label classification
Major expansion of Medical AI with a dedicated Chest X-Ray benchmark page. Features 900K+ images across 7 major datasets, leaderboard sorted by CheXpert AUC, and cross-dataset generalization analysis. Covers the rise of vision-language models (CLIP-based) and the label noise problem in radiology.
SEO & Accessibility Improvements
- -Dataset schema on benchmark pages for Google Dataset Search
- -Dynamic meta descriptions with SOTA model and scores
- -FAQPage schema on Speech and Code Generation pages
- -Canvas accessibility: aria-labels and fallback text on DocumentScanner
- -BreadcrumbList schema for improved navigation structure
Major SEO improvements following audit recommendations. Benchmark pages now include schema.org/Dataset structured data for visibility in Google Dataset Search. Meta descriptions dynamically include the current SOTA model and score. Speech and Code Generation verticals now have FAQPage schema for rich snippets.
Major Content Expansion: 6 New Verticals
- -NLP vertical: GLUE, SuperGLUE, SQuAD benchmarks with 20+ models
- -Speech vertical: Whisper vs Azure, LibriSpeech benchmarks
- -Multimodal vertical: VQA, image captioning, GPT-4V vs Gemini
- -Reasoning vertical: MATH, GSM8K, GPQA, o1 vs GPT-4 comparison
- -LLM comparison hub: GPT-4 vs Claude head-to-head analysis
- -Code generation: best-for Python, JavaScript, debugging guides
- -OCR expansion: receipts, tables, multilingual, 3 new comparisons
20+ new pages across 6 research verticals. Each vertical includes landing pages, benchmark deep dives, and model comparisons. Navigation updated to include NLP, Speech, Multimodal, Reasoning, and LLM sections.
OCR Arena Speed vs Quality Visualization
- -Interactive scatter plot: ELO score vs latency
- -18 models from OCR Arena human preference rankings
- -Green dots for open source, red for closed/API
- -Key insights: best quality, best balance, fastest
- -Full rankings table with win rates and battle counts
New visualization page showing the speed vs quality tradeoff for OCR models based on human preference data from OCR Arena. Helps identify Pareto-optimal models for different use cases.
CodeSOTA Meta-Benchmark Score
- -Aggregate score across 8 OCR benchmarks
- -Weighted scoring: primary (3x), secondary (2x), tertiary (1x)
- -Interactive heatmap: models vs benchmarks
- -Coverage tracking: see which models need testing
- -Testing priority list for contributors
Introducing the CodeSOTA Score - a single number to compare OCR models across multiple benchmarks. Primary benchmarks (OmniDocBench, OCRBench v2, olmOCR-Bench) weighted 3x, secondary (CHURRO-DS, CC-OCR) 2x, language-specific 1x. Visual heatmap shows exactly where data is missing.
Papers With Code Database Integration
- -1,500+ benchmark results from PWC archive
- -SOTA Timeline: interactive hill-climbing charts
- -146 datasets, 464 models indexed
- -15 research areas with 70+ tasks defined
- -NLP, Reasoning, Code, Speech, Medical and more
Major release integrating the full Papers With Code archive. Browse historical benchmark results with the classic "hill climbing" visualization showing SOTA progression over time. All major research areas now have defined tasks - NLP (9 tasks), Reasoning (5), Code (6), Speech (5), and more.
Papers With Code Story
- -Complete history of Papers With Code (2018-2025)
- -Why it mattered for ML research
- -What was lost when Meta shut it down
- -Why CodeSOTA exists to fill the gap
- -Cost vs Quality frontier graph on vendors page
New page explaining the story of Papers With Code - what it was, why it was invaluable, and why there is a vacuum after Meta "sunsetted" it in July 2025. Also added interactive cost vs quality graph to the vendors comparison.
Homepage Redesign & OCR Vendors Page
- -New hero section: "State of the Art, Verified"
- -Papers With Code successor positioning
- -OCR Vendors comparison page with 9 vendors
- -Decision matrix for different use cases
- -LinkedIn banner for social media
Major update to homepage positioning CodeSOTA as the next generation of ML benchmarking. New OCR vendors page consolidates all options (Mistral, Docling, GPT-4o, PaddleOCR, Tesseract, Google Doc AI, Azure, doctr, Chandra) with practical decision guidance.
Mistral OCR Documentation
- -Mistral OCR API guide with Python examples
- -Benchmark claims: 94.9% accuracy, 2000 pages/min
- -Pricing comparison: $0.001/page vs competitors
- -Independent testing caveats documented
- -Mistral vs Docling comparison table
Added comprehensive documentation for Mistral OCR API. Includes both official benchmark claims and independent testing results showing mixed performance on complex layouts.
Docling Tutorial Verified
- -All code executed and verified on real documents
- -Real outputs: 33,201 chars markdown from 10-page PDF in 34.95s
- -Table extraction verified: 3 tables with CSV export
- -Downloadable artifacts from actual test run
- -Performance metrics from Apple Silicon with MPS acceleration
The Docling tutorial now includes real, verified outputs from processing the Docling arxiv paper. No more AI-generated placeholder snippets - every code block has been executed and the actual results are shown.
Docling Documentation Added
- -Complete Docling documentation following Diataxis framework
- -Tutorial: PDF to Markdown conversion
- -How-To Guides: OCR engines, table extraction, RAG integration
- -Technical Reference: API docs, model specs
- -Explanation: Architecture deep-dive
IBM's Docling represents a significant shift in document processing - using computer vision instead of traditional OCR. We've added comprehensive documentation to help you evaluate and integrate it.
Chandra OCR Benchmark Data
- -Added Chandra OCR 0.1.0 benchmark results
- -Top performer on olmOCR-Bench at 83.1%
- -Comparison data against PaddleOCR-VL, MinerU, Marker
Chandra OCR from Allen Institute AI now leads the olmOCR-Bench leaderboard. We've added comprehensive benchmark data to help you compare it against other solutions.
Document Scanner Tutorial
- -Full document scanning pipeline with OpenCV
- -Edge detection, perspective correction, enhancement
- -Interactive demo with sample images
- -Integration guide with OCR engines
Learn to build a document scanner that detects edges, corrects perspective, and enhances scanned images. Includes full Python code and interactive examples.
Initial Launch
- -OCR benchmark leaderboard with 8 major benchmarks
- -State-of-the-art results from 50+ models
- -Methodology documentation
- -Comparison pages: PaddleOCR vs Tesseract, GPT-4o vs PaddleOCR
CodeSOTA launches with comprehensive OCR benchmarking data. Our goal: verify vendor claims independently and help you choose the right tools.
This is the complete changelog since launch. Star us on GitHub for updates.