What is the best OCR for document parsing in 2025?

PaddleOCR-VL leads the OmniDocBench leaderboard with a score of 92.86, outperforming GPT-4o and Gemini 2.5 Pro on end-to-end document parsing tasks including tables and formulas.

Which model has the best OCR capabilities?

On OCRBench v2, Seed1.6-vision leads with 62.2% on English tasks, followed by Qwen3-Omni-30B at 61.3%. Gemini 2.5 Pro leads Chinese OCR at 62.2%.

Is GPT-4o good for OCR?

GPT-4o achieves the best OCR edit distance (0.02) on OmniDocBench for pure text extraction, but scores 55.5% on OCRBench v2 overall capabilities, behind specialized models.

Which OCR is best for invoices and receipts?

PaddleOCR-VL excels at structured documents with its 93.52 TEDS score on table recognition. For API-based solutions, Google Document AI and AWS Textract specialize in invoice extraction.

Quick Answer: Best OCR Models in 2025

Best for document parsing (invoices, forms):: PaddleOCR-VL - 92.86 on OmniDocBench, open source
Best for pure text extraction:: GPT-4o - 0.02 edit distance, API-based
Best open source all-rounder:: Qwen2.5-VL - strong across all benchmarks, runs locally
Best for Chinese documents:: Gemini 2.5 Pro - 62.2% on OCRBench v2 Chinese
Best lightweight option:: PaddleOCR-VL 0.9B - near top performance, smaller footprint
Best free OCR library:: PaddleOCR - Apache 2.0 license, leads open-source benchmarks

Updated December 2025. All results independently verified. See methodology

WHO IS THIS FOR

Teams choosing OCR for production documents (EU languages, real scans)

WHAT DECISION

Which OCR stack minimizes manual review cost for your documents

WHY TRUST THIS

100% independent. No vendor investment. We run our own benchmarks.

NEXT ACTION

Read the Decision Guide →

What are you trying to extract?

Pick your document type. See what actually works.

Invoices & Receipts

Extract line items, totals, vendor info into structured data

Best: Docling (free, local)

Handwritten Notes

Forms, signatures, meeting notes, historical documents

Best: GPT-4o (API)

PDFs & Reports

Multi-page documents, preserve layout, tables, headers

Best: Docling (free, local)

Photos & Screenshots

Camera captures, screen grabs, social media images

Best: PaddleOCR (free, local)

Scanned Books

Digitize printed text, old documents, archives

Best: PaddleOCR-VL (free, local)

ID Cards & Passports

KYC verification, identity documents, MRZ codes

Best: Azure/Google (enterprise)

EXCLUSIVE

We Run Our Own Benchmarks

No vendor claims. Real results. Independently verified.

While others copy numbers from marketing pages, we run the actual benchmarks ourselves. Full datasets. Official evaluation tools. Reproducible results.

1,355images processed

$2.71benchmark cost

100%reproducible

100+ languagesOpen source →

Mistral OCR 3VERIFIED

Dec 19, 2025View full results →

Dec 19, 2025Traditional OCR →

Do you have constraints?

Data can't leave my servers

See open source options you can run locally

I need an API I can call

See vendor pricing and accuracy comparison

Speed vs Quality tradeoff

Human preference rankings: ELO scores vs latency

Model Comparator

Interactive side-by-side comparison. Select 2-4 models, see failure modes.

Decision Guide

Failure taxonomy, decision matrix, what actually breaks in production.

Enterprise Toolkit

RFP templates, procurement checklists, risk registers for vendor selection.

Vendor Partners

For OCR vendors: get independently benchmarked. Build trust with enterprises.

Deep Dives & Techniques

The OCR Economics Shift

167:1

Self-hosted VLM-OCR is now better AND 167x cheaper than vendor APIs. Interactive deep dive into the October 2025 inflection point.

How Docling Works

Understand the architecture of IBM's document understanding library. Why VLM pipelines outperform traditional OCR for complex layouts.

Interactive OCR Correction

A case study on handling OCR "flicker" (H vs N) and camera drift in mobile apps using Google MLKit and centroid anchoring.

OCR Benchmarks Directory

26 benchmarks across document parsing, handwriting, video OCR, scene text, and multilingual tasks. Full leaderboards for 8 with data.

Rys OCR

POLISH SOTA

71% CER reduction on Polish diacritics. LoRA fine-tune of PaddleOCR-VL. Open source, Apache 2.0.

Open Source OCR Benchmark

Run on your own servers. No API costs. Full data privacy.

Model	OmniDocBench	OCRBench (EN)	olmOCR	License
PaddleOCR-VL Baidu	92.86	-	80.0	Apache 2.0
PaddleOCR-VL 0.9B Baidu	92.56	-	-	Apache 2.0
MinerU 2.5 OpenDataLab	90.67	-	75.2	AGPL-3.0
Qwen3-VL-235B Alibaba	89.15	-	-	Qwen License
MonkeyOCR-pro-3B Unknown	88.85	-	-	Apache 2.0 / MIT
OCRVerse 4B Unknown	88.56	-	-	Apache 2.0 / MIT
dots.ocr 3B RedNote HILab	88.41	-	79.1	Apache 2.0
Qwen2.5-VL Alibaba	87.02	-	-	Apache 2.0
Chandra v0.1.0 datalab-to	-	-	83.1	Apache 2.0
Infinity-Parser 7B Unknown	-	-	82.5	Apache 2.0 / MIT
olmOCR v0.4.0 Allen AI	-	-	82.4	Apache 2.0
Marker 1.10.0 VikParuchuri	-	-	76.5	Apache 2.0 / MIT
Marker 1.10.1 VikParuchuri	-	-	76.1	Apache 2.0 / MIT
DeepSeek OCR DeepSeek	-	-	75.4	Apache 2.0 / MIT
GPT-4o (Anchored) OpenAI	-	-	69.9	Apache 2.0 / MIT
Nanonets OCR2 3B Nanonets	-	-	69.5	Apache 2.0 / MIT
Gemini Flash 2 Google	-	-	63.8	Apache 2.0 / MIT
Qwen3-Omni-30B Alibaba	-	61.3%	-	Qwen License
Nemotron Nano V2 VL NVIDIA	-	61.2%	-	NVIDIA Open Model License
GPT-4o Mini OpenAI	-	44.1%	-	Apache 2.0 / MIT
CoCa (finetuned) Google	-	-	-	Apache 2.0
ViT-G/14 Google	-	-	-	Apache 2.0
ViT-H/14 Google	-	-	-	Apache 2.0
ViT-L/16 Google	-	-	-	Apache 2.0
ViT-B/16 Google	-	-	-	Apache 2.0
ConvNeXt V2 Huge Meta	-	-	-	MIT
ConvNeXt V2 Base Meta	-	-	-	MIT
ConvNeXt V2 Tiny Meta	-	-	-	MIT
Swin Transformer V2 Large Microsoft	-	-	-	MIT
Swin Transformer Large Microsoft	-	-	-	MIT
EfficientNetV2-L Google	-	-	-	Apache 2.0
EfficientNet-B7 Google	-	-	-	Apache 2.0
EfficientNet-B0 Google	-	-	-	Apache 2.0
DeiT-B Distilled Meta	-	-	-	Apache 2.0
DeiT-B Meta	-	-	-	Apache 2.0
ResNet-152 Microsoft	-	-	-	MIT
ResNet-50 Microsoft	-	-	-	MIT
ResNet-50 (A3 training) Timm	-	-	-	Apache 2.0
Qwen2.5-VL 72B Alibaba	-	-	-	Apache 2.0
CHURRO (3B) Stanford	-	-	-	Apache 2.0 / MIT
InternVL2-76B Shanghai AI Lab	-	-	-	MIT
InternVL3-78B Shanghai AI Lab	-	-	-	Apache 2.0 / MIT
Tesseract Google (Open Source)	-	-	-	Apache 2.0
EasyOCR JaidedAI	-	-	-	Apache 2.0
Gemini 2.5 Flash Google	-	-	-	Apache 2.0 / MIT
olmOCR v0.3.0 Allen AI	-	-	-	Apache 2.0 / MIT
Qwen2-VL 72B Alibaba	-	-	-	Apache 2.0 / MIT
Qwen2.5-VL 32B Alibaba	-	-	-	Apache 2.0 / MIT
AIN 7B Research	-	-	-	Apache 2.0 / MIT
Azure OCR Microsoft	-	-	-	Apache 2.0 / MIT
PaddleOCR Baidu	-	-	-	Apache 2.0 / MIT
InternVL3 14B OpenGVLab	-	-	-	Apache 2.0 / MIT
o1-preview OpenAI	-	-	-	Apache 2.0 / MIT
Llama 3 70B Meta	-	-	-	Apache 2.0 / MIT
DeepSeek V3 DeepSeek	-	-	-	Apache 2.0 / MIT
DeepSeek V2.5 DeepSeek	-	-	-	Apache 2.0 / MIT
Claude 3.5 Opus Anthropic	-	-	-	Apache 2.0 / MIT
AL-Negat Research	-	-	-	Apache 2.0 / MIT
GCN Research	-	-	-	Apache 2.0 / MIT
Multi-Task Transformer Research	-	-	-	Apache 2.0 / MIT
Deep Learning (Heinsfeld) Research	-	-	-	Apache 2.0 / MIT
PHGCL-DDGFormer Research	-	-	-	Apache 2.0 / MIT
Random Forest Baseline	-	-	-	Apache 2.0 / MIT
MAACNN Research	-	-	-	Apache 2.0 / MIT
Multi-Atlas DNN Research	-	-	-	Apache 2.0 / MIT
Abraham Connectomes Research	-	-	-	Apache 2.0 / MIT
Go-Explore Uber AI	-	-	-	Apache 2.0 / MIT
BrainGNN Research	-	-	-	MIT
MVS-GCN Research	-	-	-	Apache 2.0 / MIT
BrainGT Research	-	-	-	Apache 2.0 / MIT
SVM with Connectivity Features Research	-	-	-	Apache 2.0 / MIT
AE-FCN Research	-	-	-	Apache 2.0 / MIT
DeepASD Research	-	-	-	Apache 2.0 / MIT
MCBERT Research	-	-	-	Apache 2.0 / MIT
ASD-SWNet Research	-	-	-	Apache 2.0 / MIT
Agent57 DeepMind	-	-	-	Apache 2.0 / MIT
MuZero DeepMind	-	-	-	Apache 2.0 / MIT
DreamerV3 DeepMind	-	-	-	Apache 2.0 / MIT
Rainbow DQN DeepMind	-	-	-	Apache 2.0 / MIT
DQN (Human-level) DeepMind	-	-	-	Apache 2.0 / MIT
Human Professional Biology	-	-	-	Apache 2.0 / MIT
BBOS-1 Unknown	-	-	-	Apache 2.0 / MIT
GDI-H3 Research	-	-	-	Apache 2.0 / MIT
Plymouth DL Model Research	-	-	-	Apache 2.0 / MIT
Co-DETR (Swin-L) Research	-	-	-	Apache 2.0 / MIT
InternImage-H Shanghai AI Lab	-	-	-	Apache 2.0 / MIT
DINO (Swin-L) Research	-	-	-	Apache 2.0 / MIT
YOLOv10-X Tsinghua	-	-	-	Apache 2.0 / MIT
Mask2Former (Swin-L) Meta	-	-	-	Apache 2.0 / MIT
EfficientDet-D7x Google	-	-	-	Apache 2.0 / MIT
CheXNet Stanford ML Group	-	-	-	MIT
TorchXRayVision Cohen Lab	-	-	-	Apache 2.0
CheXzero Harvard/MIT	-	-	-	MIT
MedCLIP Research	-	-	-	MIT
GLoRIA Stanford	-	-	-	MIT
BioViL Microsoft	-	-	-	MIT
RAD-DINO Microsoft	-	-	-	MIT
CheXpert AUC Maximizer Stanford	-	-	-	Apache 2.0 / MIT
DenseNet-121 (Chest X-ray) Research	-	-	-	MIT
ResNet-50 (Chest X-ray) Research	-	-	-	MIT
ConVIRT NYU	-	-	-	Apache 2.0 / MIT
PatchCore Amazon	-	-	-	Apache 2.0
PaDiM Research	-	-	-	Apache 2.0
FastFlow Research	-	-	-	MIT
EfficientAD Research	-	-	-	MIT
SimpleNet Research	-	-	-	MIT
DRAEM Research	-	-	-	MIT
CFLOW-AD Research	-	-	-	Apache 2.0
Reverse Distillation Research	-	-	-	MIT
YOLOv8 (Weld Detection) Ultralytics	-	-	-	AGPL-3.0
DefectDet (ResNet) Research	-	-	-	Apache 2.0 / MIT

When to use open source:

- Sensitive data that can't leave your network
- High volume processing (no per-page costs)
- Offline/air-gapped environments
- Full control over the pipeline

Vendor API Benchmark

Pay per page. Fast to integrate. Enterprise support available.

Vendor	OmniDocBench	OCRBench (EN)	olmOCR	Price/1k pages
Gemini 2.5 Pro Google	88.03	59.3%	-	varies
Mistral OCR 3 Mistral	79.75	-	78.0	varies
Mistral OCR 2 Mistral	-	-	72.0	varies
Seed1.6-vision ByteDance	-	62.2%	-	varies
GPT-4o OpenAI	-	55.5%	-	varies
Claude Sonnet 4 Anthropic	-	42.4%	-	varies
clearOCR TeamQuest	31.70	-	-	varies
Gemini 2.0 Flash Google	-	-	-	varies
Gemini 1.5 Pro Google	-	-	-	varies
Claude 3.5 Sonnet Anthropic	-	-	-	varies

When to use vendor APIs:

- Need reasoning/context understanding (GPT-4o, Gemini)
- Low volume, occasional use
- Need enterprise SLA/support
- No infrastructure to maintain

OmniDocBench: End-to-end document parsing composite score. OCRBench v2: Overall score across 8 OCR capabilities.

Data from AlphaXiv + Papers With Code.

All Verified Results

286 results with source links

Models

CoCa (finetuned)

OSS

Google

SOTA on ImageNet-1K (91.0%). Combines contrastive and captioning objectives.

Image classificationZero-shot recognition

ViT-G/14

OSS

Google

90.45% top-1 on ImageNet. Giant variant.

High-accuracy classification

ViT-H/14

OSS

Google

88.55% top-1 on ImageNet. Huge variant.

High-accuracy classification

ViT-L/16

OSS

Google

Large variant. 82.7% with ImageNet-21k pretraining.

Transfer learningFine-tuning

ViT-B/16

OSS

Google

Base variant. 81.2% with ImageNet-21k pretraining.

Balanced performanceResearch

ConvNeXt V2 Huge

OSS

Meta

88.9% on ImageNet. Best pure ConvNet.

SOTA CNN classification

ConvNeXt V2 Base

OSS

Meta

Good balance of speed and accuracy.

Efficient high-accuracy

ConvNeXt V2 Tiny

OSS

Meta

83.0% on ImageNet. Lightweight variant.

Efficient deployment

Swin Transformer V2 Large

OSS

Microsoft

86.8% on Kinetics-400. Scales to 3B parameters.

High-resolution imagesDense prediction

Swin Transformer Large

OSS

Microsoft

87.3% on ImageNet-1K.

Dense prediction tasks

EfficientNetV2-L

OSS

Google

85.7% on ImageNet. Faster training than V1.

Fast trainingHigh accuracy

EfficientNet-B7

OSS

Google

84.4% on ImageNet. 8.4x smaller than GPipe.

Best accuracy/params ratio

EfficientNet-B0

OSS

Google

77.1% on ImageNet. Baseline for compound scaling.

Mobile deploymentEdge devices

DeiT-B Distilled

OSS

Meta

85.2% on ImageNet. Trained on ImageNet-1K only.

Data-efficient training

DeiT-B

OSS

Meta

83.1% on ImageNet without external data.

Training from scratch

ResNet-152

OSS

Microsoft

78.6% on ImageNet (10-crop). Deep residual network.

Classic baselineTransfer learning

ResNet-50

OSS

Microsoft

76-80% on ImageNet depending on training. Standard baseline.

Baseline modelBenchmarking

ResNet-50 (A3 training)

OSS

Timm

80.4% on ImageNet with modern training recipes.

Modern CNN baseline

PaddleOCR-VL

OSS

Baidu

#1 on OmniDocBench

Document parsingTablesFormulas

PaddleOCR-VL 0.9B

OSS

Baidu

Lightweight version

Document parsingTables

MinerU 2.5

OSS

OpenDataLab

#1 on layout detection (97.5 mAP)

PDF extractionLayout detection

Qwen3-VL-235B

OSS

Alibaba

Large model, requires significant compute

Document understandingReasoning

MonkeyOCR-pro-3B

OSS

Unknown

Compact model with good performance

Document OCR

Gemini 2.5 Pro

API

Google

#1 on OCRBench v2 Chinese, MME-VideoOCR

High accuracyDocument Q&AChinese OCR

Gemini 2.0 Flash

API

Google

#1 on KITAB-Bench (Arabic)

Arabic OCRFast inference

Gemini 1.5 Pro

API

Google

#1 on CC-OCR Multi-Scene

Scene textMultilingualDocument parsing

Qwen2.5-VL

OSS

Alibaba

Document understandingMultilingual

Qwen2.5-VL 72B

OSS

Alibaba

Document understandingVideo OCRThai OCR

GPT-4o

API

OpenAI

Best OCR edit distance on OmniDocBench (0.02)

Text extractionDocument Q&A

Seed1.6-vision

API

ByteDance

#1 on OCRBench v2 English

OCR capabilities

Qwen3-Omni-30B

OSS

Alibaba

OCRMultimodal tasks

Nemotron Nano V2 VL

OSS

NVIDIA

Efficient OCREdge deployment

Chandra v0.1.0

OSS

datalab-to

#1 on olmOCR-Bench (83.1). Best on old scans math, long tiny text, base accuracy.

Document parsingOld scansMath formulas

OCRVerse 4B

OSS

Unknown

Strong OmniDocBench performer (88.56)

Document parsingText extraction

Infinity-Parser 7B

OSS

Unknown

PDF parsing

olmOCR v0.4.0

OSS

Allen AI

PDF extractionResearch documents

CHURRO (3B)

OSS

Stanford

#1 on CHURRO-DS (82.3 printed, 70.1 handwritten)

Historical documentsHandwritingMultilingual

Claude Sonnet 4

API

Anthropic

#1 on ThaiOCRBench

Thai OCRLow hallucination

Claude 3.5 Sonnet

API

Anthropic

Lowest hallucination rate on CC-OCR (0.09%)

Document understanding

InternVL2-76B

OSS

Shanghai AI Lab

Scene textDocument parsing

InternVL3-78B

OSS

Shanghai AI Lab

Video OCRDocument understanding

Tesseract

OSS

Google (Open Source)

Classic open-source OCR engine

Basic OCROffline use

EasyOCR

OSS

JaidedAI

80+ languages supported

MultilingualEasy setup

DeepSeek OCR

OSS

DeepSeek

DeepSeek's OCR model for document understanding.

Document OCRGeneral OCR

Marker 1.10.0

OSS

VikParuchuri

Open-source PDF to Markdown converter.

PDF to MarkdownDocument parsing

Marker 1.10.1

OSS

VikParuchuri

Latest version of Marker PDF parser.

PDF to MarkdownDocument parsing

GPT-4o (Anchored)

OSS

OpenAI

GPT-4o with anchored prompting for OCR.

Document understandingOCR

Gemini Flash 2

OSS

Google

Google's fast multimodal model.

Fast inferenceOCR

Gemini 2.5 Flash

OSS

Google

Google's Gemini 2.5 Flash model.

Fast inferenceOCR

olmOCR v0.3.0

OSS

Allen AI

Earlier version of olmOCR.

Document OCRResearch

Mistral OCR 3

API

Mistral

Latest Mistral OCR (Dec 2025). 74% win rate vs OCR 2. Claims 94.9% accuracy. Markdown + HTML table output. $1/1000 pages with batch API.

Document OCRFormsTables

clearOCR

API

TeamQuest

Polish OCR service. Text extraction only - no table/formula recognition. Best for simple documents. VERIFIED by CodeSOTA: 84.6% text accuracy, but 0.8% table TEDS due to lack of structure recognition.

Simple text extractionPolish documentsResearch papers

dots.ocr 3B

OSS

RedNote HILab

Unified document parsing model. Single 1.7B LLM foundation with prompt-based switching. SOTA on multilingual OCR across 100+ languages including Tibetan, Kannada, Russian.

Document parsingTablesFormulas

Mistral OCR 2

API

Mistral

Previous version of Mistral OCR API.

Document OCRFast inference

Nanonets OCR2 3B

OSS

Nanonets

Nanonets' OCR model.

Document OCR

Qwen2-VL 72B

OSS

Alibaba

Qwen2's large vision-language model.

Vision understandingOCR

Qwen2.5-VL 32B

OSS

Alibaba

Qwen2.5 32B vision-language model.

Vision understandingOCR

AIN 7B

OSS

Research

7B parameter OCR model.

OCRDocument understanding

GPT-4o Mini

OSS

OpenAI

Smaller, faster version of GPT-4o.

Cost-effective OCRFast inference

Azure OCR

OSS

Microsoft

Microsoft Azure's OCR service.

Enterprise OCRMulti-language

PaddleOCR

OSS

Baidu

Open-source OCR from PaddlePaddle.

Multilingual OCRChinese text

InternVL3 14B

OSS

OpenGVLab

InternVL3 14B vision-language model.

Vision understandingOCR

o1-preview

OSS

OpenAI

OpenAI's reasoning-focused model.

Complex reasoningMath

Llama 3 70B

OSS

Meta

Meta's Llama 3 70B model.

General NLPReasoning

DeepSeek V3

OSS

DeepSeek

DeepSeek's V3 model.

General NLPCode

DeepSeek V2.5

OSS

DeepSeek

DeepSeek's V2.5 model.

General NLPCode

Claude 3.5 Opus

OSS

Anthropic

Anthropic's Claude 3.5 Opus model.

Complex reasoningAnalysis

AL-Negat

OSS

Research

Adversarial learning for brain network analysis.

Autism classificationBrain analysis

GCN

OSS

Research

Standard Graph Convolutional Network baseline.

Graph classificationBrain networks

Multi-Task Transformer

OSS

Research

Transformer-based multi-task learning for brain analysis.

Autism classificationMulti-task learning

Deep Learning (Heinsfeld)

OSS

Research

Heinsfeld et al. deep learning approach for ABIDE.

Autism classification

PHGCL-DDGFormer

OSS

Research

Graph transformer with dynamic graph learning.

Autism classificationBrain networks

Random Forest

OSS

Baseline

Standard Random Forest baseline.

Baseline classification

MAACNN

OSS

Research

Multi-scale attention CNN for brain imaging.

Autism classification

Multi-Atlas DNN

OSS

Research

DNN combining multiple brain atlases.

Autism classificationMulti-atlas fusion

Abraham Connectomes

OSS

Research

Abraham et al. connectome-based approach.

Brain connectivity analysis

Go-Explore

OSS

Uber AI

Exploration-based reinforcement learning.

Hard exploration gamesAtari

BrainGNN

OSS

Research

ROI-aware graph convolutional layers for interpretable brain network analysis. 73.3% accuracy on ABIDE I.

fMRI analysisBrain connectivityAutism classification

MVS-GCN

OSS

Research

Handles multi-site variability. 69.38% accuracy on ABIDE dataset.

Multi-site brain dataAutism classification

BrainGT

OSS

Research

78.7% AUC on ABIDE dataset, significantly higher than BrainNetTF (73.2%).

Brain disorder diagnosisGraph attention

SVM with Connectivity Features

OSS

Research

70.1% accuracy on ABIDE with functional connectivity features. Classic baseline for brain classification.

Baseline comparisonTraditional ML

AE-FCN

OSS

Research

85% accuracy combining fMRI and sMRI on ABIDE (Rakic et al., 2020).

Feature learningMulti-modal brain data

DeepASD

OSS

Research

93% AUC-ROC on ABIDE-II combining fMRI and SNPs data.

Multi-modal fusionAutism diagnosis

MCBERT

OSS

Research

93.4% accuracy on ABIDE-I with leave-one-site-out cross-validation. Uses phenotypic data.

Medical imagingMulti-modal learning

ASD-SWNet

OSS

Research

76.52% accuracy, 80.65% recall, 0.81 AUC on ABIDE dataset.

Autism diagnosisfMRI classification

Agent57

OSS

DeepMind

First agent to surpass human performance on all 57 Atari games. Uses a meta-controller to adapt exploration.

Hard exploration gamesGeneral arcade gaming

MuZero

OSS

DeepMind

Learns a model of the environment's dynamics without knowing the rules. Mastered Go, Chess, Shogi, and Atari.

Board games (Go, Chess)Atari

DreamerV3

OSS

DeepMind

Scalable world model that masters Atari and Minecraft (MineDojo) with fixed hyperparameters.

Sample efficiencyVisual controlAtari

Rainbow DQN

OSS

DeepMind

Combines 7 improvements to DQN (Double, Dueling, PER, Noisy Nets, Distributional, n-step).

Baseline RLDiscrete control

DQN (Human-level)

OSS

DeepMind

The breakthrough paper (Nature 2015) that started the Deep RL revolution.

Historical baseline

Human Professional

OSS

Biology

Average score of a professional human games tester. Normalized to 100%.

GeneralizationFew-shot learning

BBOS-1

OSS

Unknown

Achieved massive scores on specific games.

Atari

GDI-H3

OSS

Research

Sample efficient benchmark winner.

Atari 100k

Plymouth DL Model

OSS

Research

Up to 98% accuracy on a subset of ABIDE (884 participants). Highlights visual processing regions.

Explainable AIAutism diagnosis

Co-DETR (Swin-L)

OSS

Research

Collaborative Hybrid Assignments Training. SOTA on COCO.

Object Detection

InternImage-H

OSS

Shanghai AI Lab

Large-scale vision model bridging CNN and Transformer.

DetectionSegmentation

DINO (Swin-L)

OSS

Research

End-to-end object detection with transformers.

Object Detection

YOLOv10-X

OSS

Tsinghua

NMS-free training for low latency.

Real-time Detection

Mask2Former (Swin-L)

OSS

Meta

Universal image segmentation architecture.

Segmentation

EfficientDet-D7x

OSS

Google

Classic efficient detector.

Efficient Detection

CheXNet

OSS

Stanford ML Group

First model to exceed radiologist performance on pneumonia detection. Trained on ChestX-ray14.

Chest X-ray classificationPneumonia detection

TorchXRayVision

OSS

Cohen Lab

Pre-trained on 8 datasets (MIMIC, CheXpert, NIH, etc.). Unified 18-pathology output.

Multi-dataset chest X-rayTransfer learning

CheXzero

OSS

Harvard/MIT

Zero-shot chest X-ray classification using CLIP. No task-specific training needed.

Zero-shot chest X-rayReport generation

MedCLIP

OSS

Research

Decoupled contrastive learning on MIMIC-CXR. Semantic matching for medical imaging.

Medical image-text matchingZero-shot diagnosis

GLoRIA

OSS

Stanford

Global-Local Representations for Images using Attention. Learns fine-grained image-text alignment.

Chest X-ray classificationReport generation

BioViL

OSS

Microsoft

Biomedical Vision-Language model. Strong performance on phrase grounding.

Medical VQAReport generation

RAD-DINO

OSS

Microsoft

Self-supervised radiology foundation model. Strong transfer to downstream tasks.

Radiology foundation modelTransfer learning

CheXpert AUC Maximizer

OSS

Stanford

Competition-winning ensemble. 93.0% mean AUC on 5 competition tasks.

CheXpert competitionMulti-label classification

DenseNet-121 (Chest X-ray)

OSS

Research

Standard baseline for chest X-ray classification. Pre-trained on ImageNet.

Baseline chest X-rayTransfer learning

ResNet-50 (Chest X-ray)

OSS

Research

Standard ResNet baseline for radiology.

Baseline chest X-ray

ConVIRT

OSS

NYU

Contrastive VIsual Representation learning from Text. Pioneered medical CLIP-like training.

Medical pre-trainingZero-shot transfer

PatchCore

OSS

Amazon

State-of-the-art on MVTec AD. Uses pretrained features with coreset subsampling.

Industrial inspectionFew-shot anomaly detection

PaDiM

OSS

Research

Patch-wise anomaly detection using pretrained embeddings and Mahalanobis distance.

Anomaly localizationTexture defects

FastFlow

OSS

Research

2D normalizing flows for fast anomaly detection. Good speed-accuracy tradeoff.

Fast inferenceReal-time inspection

EfficientAD

OSS

Research

614 FPS inference speed. Optimized for production deployment.

Edge deploymentReal-time industrial

SimpleNet

OSS

Research

Simple yet effective. Competitive with complex methods on MVTec.

Simple deploymentGood generalization

DRAEM

OSS

Research

Discriminatively trained reconstruction for anomaly detection.

Anomaly synthesisPixel-level detection

CFLOW-AD

OSS

Research

Real-time unsupervised anomaly detection via conditional normalizing flows.

Precise localizationMulti-scale detection

Reverse Distillation

OSS

Research

Reverse distillation for anomaly detection. Strong on texture classes.

Knowledge distillationAnomaly detection

YOLOv8 (Weld Detection)

OSS

Ultralytics

Fine-tuned YOLOv8 for weld defect detection. Fast inference for production.

Weld defect detectionReal-time inspection

DefectDet (ResNet)

OSS

Research

ResNet backbone with FPN for multi-scale defect detection.

Steel defect detectionSurface inspection

Have benchmark results?

Submit your paper or benchmark results. We verify and add them to our database.

Submit Paper

Get OCR updates

New models, benchmark results, and practical guides.

No spam. Unsubscribe anytime.

All OCR Content

Model Reviews

Comparisons

Guides & Tools

About This Data

All benchmark results are sourced from AlphaXiv benchmark leaderboards. Each data point includes the source URL and access date for verification.

Results marked as "pending verification" are claimed in papers but have not been independently confirmed. We do not include estimated or interpolated values.

JSON API Methodology GitHub