What are you trying to extract?
Pick your document type. See what actually works.
Invoices & Receipts
Extract line items, totals, vendor info into structured data
Handwritten Notes
Forms, signatures, meeting notes, historical documents
PDFs & Reports
Multi-page documents, preserve layout, tables, headers
Photos & Screenshots
Camera captures, screen grabs, social media images
Scanned Books
Digitize printed text, old documents, archives
ID Cards & Passports
KYC verification, identity documents, MRZ codes
We Run Our Own Benchmarks
No vendor claims. Real results. Independently verified.
While others copy numbers from marketing pages, we run the actual benchmarks ourselves. Full datasets. Official evaluation tools. Reproducible results.
Do you have constraints?
Deep Dives & Techniques
How Docling Works
Understand the architecture of IBM's document understanding library. Why VLM pipelines outperform traditional OCR for complex layouts.
Interactive OCR Correction
A case study on handling OCR "flicker" (H vs N) and camera drift in mobile apps using Google MLKit and centroid anchoring.
Open Source OCR Benchmark
Run on your own servers. No API costs. Full data privacy.
| Model | OmniDocBench | OCRBench (EN) | olmOCR | License |
|---|---|---|---|---|
| PaddleOCR-VL Baidu | 92.86 | - | 80.0 | Apache 2.0 |
| PaddleOCR-VL 0.9B Baidu | 92.56 | - | - | Apache 2.0 |
| MinerU 2.5 OpenDataLab | 90.67 | - | 75.2 | AGPL-3.0 |
| Qwen3-VL-235B Alibaba | 89.15 | - | - | Qwen License |
| MonkeyOCR-pro-3B Unknown | 88.85 | - | - | Apache 2.0 / MIT |
| OCRVerse 4B Unknown | 88.56 | - | - | Apache 2.0 / MIT |
| dots.ocr 3B Unknown | 88.41 | - | 79.1 | Apache 2.0 / MIT |
| Qwen2.5-VL Alibaba | 87.02 | - | - | Apache 2.0 |
| Chandra v0.1.0 datalab-to | - | - | 83.1 | Apache 2.0 |
| Infinity-Parser 7B Unknown | - | - | 82.5 | Apache 2.0 / MIT |
| olmOCR v0.4.0 Allen AI | - | - | 82.4 | Apache 2.0 |
| Marker 1.10.0 VikParuchuri | - | - | 76.5 | Apache 2.0 / MIT |
| Marker 1.10.1 VikParuchuri | - | - | 76.1 | Apache 2.0 / MIT |
| DeepSeek OCR DeepSeek | - | - | 75.4 | Apache 2.0 / MIT |
| GPT-4o (Anchored) OpenAI | - | - | 69.9 | Apache 2.0 / MIT |
| Nanonets OCR2 3B Nanonets | - | - | 69.5 | Apache 2.0 / MIT |
| Gemini Flash 2 Google | - | - | 63.8 | Apache 2.0 / MIT |
| Qwen3-Omni-30B Alibaba | - | 61.3% | - | Qwen License |
| Nemotron Nano V2 VL NVIDIA | - | 61.2% | - | NVIDIA Open Model License |
| CoCa (finetuned) Google | - | - | - | Apache 2.0 |
| ViT-G/14 Google | - | - | - | Apache 2.0 |
| ViT-H/14 Google | - | - | - | Apache 2.0 |
| ViT-L/16 Google | - | - | - | Apache 2.0 |
| ViT-B/16 Google | - | - | - | Apache 2.0 |
| ConvNeXt V2 Huge Meta | - | - | - | MIT |
| ConvNeXt V2 Base Meta | - | - | - | MIT |
| ConvNeXt V2 Tiny Meta | - | - | - | MIT |
| Swin Transformer V2 Large Microsoft | - | - | - | MIT |
| Swin Transformer Large Microsoft | - | - | - | MIT |
| EfficientNetV2-L Google | - | - | - | Apache 2.0 |
| EfficientNet-B7 Google | - | - | - | Apache 2.0 |
| EfficientNet-B0 Google | - | - | - | Apache 2.0 |
| DeiT-B Distilled Meta | - | - | - | Apache 2.0 |
| DeiT-B Meta | - | - | - | Apache 2.0 |
| ResNet-152 Microsoft | - | - | - | MIT |
| ResNet-50 Microsoft | - | - | - | MIT |
| ResNet-50 (A3 training) Timm | - | - | - | Apache 2.0 |
| Qwen2.5-VL 72B Alibaba | - | - | - | Apache 2.0 |
| CHURRO (3B) Stanford | - | - | - | Apache 2.0 / MIT |
| InternVL2-76B Shanghai AI Lab | - | - | - | MIT |
| InternVL3-78B Shanghai AI Lab | - | - | - | Apache 2.0 / MIT |
| Tesseract Google (Open Source) | - | - | - | Apache 2.0 |
| EasyOCR JaidedAI | - | - | - | Apache 2.0 |
| Gemini 2.5 Flash Google | - | - | - | Apache 2.0 / MIT |
| olmOCR v0.3.0 Allen AI | - | - | - | Apache 2.0 / MIT |
| Qwen2-VL 72B Alibaba | - | - | - | Apache 2.0 / MIT |
| Qwen2.5-VL 32B Alibaba | - | - | - | Apache 2.0 / MIT |
| AIN 7B Research | - | - | - | Apache 2.0 / MIT |
| GPT-4o Mini OpenAI | - | - | - | Apache 2.0 / MIT |
| Azure OCR Microsoft | - | - | - | Apache 2.0 / MIT |
| PaddleOCR Baidu | - | - | - | Apache 2.0 / MIT |
| InternVL3 14B OpenGVLab | - | - | - | Apache 2.0 / MIT |
| o1-preview OpenAI | - | - | - | Apache 2.0 / MIT |
| Llama 3 70B Meta | - | - | - | Apache 2.0 / MIT |
| DeepSeek V3 DeepSeek | - | - | - | Apache 2.0 / MIT |
| DeepSeek V2.5 DeepSeek | - | - | - | Apache 2.0 / MIT |
| Claude 3.5 Opus Anthropic | - | - | - | Apache 2.0 / MIT |
| AL-Negat Research | - | - | - | Apache 2.0 / MIT |
| GCN Research | - | - | - | Apache 2.0 / MIT |
| Multi-Task Transformer Research | - | - | - | Apache 2.0 / MIT |
| Deep Learning (Heinsfeld) Research | - | - | - | Apache 2.0 / MIT |
| PHGCL-DDGFormer Research | - | - | - | Apache 2.0 / MIT |
| Random Forest Baseline | - | - | - | Apache 2.0 / MIT |
| MAACNN Research | - | - | - | Apache 2.0 / MIT |
| Multi-Atlas DNN Research | - | - | - | Apache 2.0 / MIT |
| Abraham Connectomes Research | - | - | - | Apache 2.0 / MIT |
| Go-Explore Uber AI | - | - | - | Apache 2.0 / MIT |
| BrainGNN Research | - | - | - | MIT |
| MVS-GCN Research | - | - | - | Apache 2.0 / MIT |
| BrainGT Research | - | - | - | Apache 2.0 / MIT |
| SVM with Connectivity Features Research | - | - | - | Apache 2.0 / MIT |
| AE-FCN Research | - | - | - | Apache 2.0 / MIT |
| DeepASD Research | - | - | - | Apache 2.0 / MIT |
| MCBERT Research | - | - | - | Apache 2.0 / MIT |
| ASD-SWNet Research | - | - | - | Apache 2.0 / MIT |
| Agent57 DeepMind | - | - | - | Apache 2.0 / MIT |
| MuZero DeepMind | - | - | - | Apache 2.0 / MIT |
| DreamerV3 DeepMind | - | - | - | Apache 2.0 / MIT |
| Rainbow DQN DeepMind | - | - | - | Apache 2.0 / MIT |
| DQN (Human-level) DeepMind | - | - | - | Apache 2.0 / MIT |
| Human Professional Biology | - | - | - | Apache 2.0 / MIT |
| BBOS-1 Unknown | - | - | - | Apache 2.0 / MIT |
| GDI-H3 Research | - | - | - | Apache 2.0 / MIT |
| Plymouth DL Model Research | - | - | - | Apache 2.0 / MIT |
| Co-DETR (Swin-L) Research | - | - | - | Apache 2.0 / MIT |
| InternImage-H Shanghai AI Lab | - | - | - | Apache 2.0 / MIT |
| DINO (Swin-L) Research | - | - | - | Apache 2.0 / MIT |
| YOLOv10-X Tsinghua | - | - | - | Apache 2.0 / MIT |
| Mask2Former (Swin-L) Meta | - | - | - | Apache 2.0 / MIT |
| EfficientDet-D7x Google | - | - | - | Apache 2.0 / MIT |
| CheXNet Stanford ML Group | - | - | - | MIT |
| TorchXRayVision Cohen Lab | - | - | - | Apache 2.0 |
| CheXzero Harvard/MIT | - | - | - | MIT |
| MedCLIP Research | - | - | - | MIT |
| GLoRIA Stanford | - | - | - | MIT |
| BioViL Microsoft | - | - | - | MIT |
| RAD-DINO Microsoft | - | - | - | MIT |
| CheXpert AUC Maximizer Stanford | - | - | - | Apache 2.0 / MIT |
| DenseNet-121 (Chest X-ray) Research | - | - | - | MIT |
| ResNet-50 (Chest X-ray) Research | - | - | - | MIT |
| ConVIRT NYU | - | - | - | Apache 2.0 / MIT |
| PatchCore Amazon | - | - | - | Apache 2.0 |
| PaDiM Research | - | - | - | Apache 2.0 |
| FastFlow Research | - | - | - | MIT |
| EfficientAD Research | - | - | - | MIT |
| SimpleNet Research | - | - | - | MIT |
| DRAEM Research | - | - | - | MIT |
| CFLOW-AD Research | - | - | - | Apache 2.0 |
| Reverse Distillation Research | - | - | - | MIT |
| YOLOv8 (Weld Detection) Ultralytics | - | - | - | AGPL-3.0 |
| DefectDet (ResNet) Research | - | - | - | Apache 2.0 / MIT |
- - Sensitive data that can't leave your network
- - High volume processing (no per-page costs)
- - Offline/air-gapped environments
- - Full control over the pipeline
Vendor API Benchmark
Pay per page. Fast to integrate. Enterprise support available.
| Vendor | OmniDocBench | OCRBench (EN) | olmOCR | Price/1k pages |
|---|---|---|---|---|
| Gemini 2.5 Pro Google | 88.03 | 59.3% | - | varies |
| Mistral OCR 3 Mistral | 79.75 | - | 78.0 | varies |
| Mistral OCR 2 Mistral | - | - | 72.0 | varies |
| Seed1.6-vision ByteDance | - | 62.2% | - | varies |
| GPT-4o OpenAI | - | 55.5% | - | varies |
| clearOCR TeamQuest | 31.70 | - | - | varies |
| Gemini 2.0 Flash Google | - | - | - | varies |
| Gemini 1.5 Pro Google | - | - | - | varies |
| Claude Sonnet 4 Anthropic | - | - | - | varies |
| Claude 3.5 Sonnet Anthropic | - | - | - | varies |
- - Need reasoning/context understanding (GPT-4o, Gemini)
- - Low volume, occasional use
- - Need enterprise SLA/support
- - No infrastructure to maintain
CodeSOTA Score: Cross-Benchmark Comparison
One number to compare models across all benchmarks. Weighted average: primary benchmarks (3x), secondary (2x), language-specific (1x).
| Model | Score | Cover | OmniDoc *** | OCRBench *** | olmOCR *** | CHURRO ** | CC-OCR ** | KITAB * | ThaiOCR * | VideoOCR * |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 paddleocr-vl | 86.4 | 2/8 | 93 | -- | 80 | -- | -- | -- | -- | -- |
| 2 dots-ocr-3b | 83.8 | 2/8 | 88 | -- | 79 | -- | -- | -- | -- | -- |
| 3 mineru-2.5 | 82.9 | 2/8 | 91 | -- | 75 | -- | -- | -- | -- | -- |
| 4 mistral-ocr-3 | 78.9 | 2/8 | 80 | -- | 78 | -- | -- | -- | -- | -- |
| 5 gemini-15-pro | 77.1 | 2/8 | -- | -- | -- | -- | 83 | -- | -- | 65 |
| 6 gemini-25-pro | 72.0 | 5/8 | 88 | 59 | -- | 64 | -- | -- | 77 | 74 |
| 7 qwen25-vl-32b | 68.8 | 2/8 | -- | -- | -- | -- | -- | -- | 77 | 61 |
| 8 qwen25-vl-72b | 62.5 | 3/8 | -- | -- | -- | 55 | -- | -- | 72 | 69 |
| 9 gpt-4o | 58.1 | 5/8 | -- | 56 | -- | 34 | 76 | 69 | -- | 66 |
| 10 claude-sonnet-4 | 52.7 | 2/8 | -- | -- | -- | 37 | -- | -- | 84 | -- |
| 11 paddleocr-vl-0.9b | -- | 1/8 | 93 | -- | -- | -- | -- | -- | -- | -- |
| 12 qwen3-vl-235b | -- | 1/8 | 89 | -- | -- | -- | -- | -- | -- | -- |
| 13 monkeyocr-pro-3b | -- | 1/8 | 89 | -- | -- | -- | -- | -- | -- | -- |
| 14 qwen25-vl | -- | 1/8 | 87 | -- | -- | -- | -- | -- | -- | -- |
| 15 ocrverse-4b | -- | 1/8 | 89 | -- | -- | -- | -- | -- | -- | -- |
| 16 clearocr-teamquest | -- | 1/8 | 32 | -- | -- | -- | -- | -- | -- | -- |
| 17 seed-1.6-vision | -- | 1/8 | -- | 62 | -- | -- | -- | -- | -- | -- |
| 18 qwen3-omni-30b | -- | 1/8 | -- | 61 | -- | -- | -- | -- | -- | -- |
| 19 nemotron-nano-v2-vl | -- | 1/8 | -- | 61 | -- | -- | -- | -- | -- | -- |
| 20 chandra-ocr-0.1.0 | -- | 1/8 | -- | -- | 83 | -- | -- | -- | -- | -- |
| 21 deepseek-ocr | -- | 1/8 | -- | -- | 75 | -- | -- | -- | -- | -- |
| 22 marker-1.10.0 | -- | 1/8 | -- | -- | 77 | -- | -- | -- | -- | -- |
| 23 gpt-4o-anchored | -- | 1/8 | -- | -- | 70 | -- | -- | -- | -- | -- |
| 24 gemini-flash-2 | -- | 1/8 | -- | -- | 64 | -- | -- | -- | -- | -- |
| 25 infinity-parser-7b | -- | 1/8 | -- | -- | 83 | -- | -- | -- | -- | -- |
TODO: Priority benchmarks to run
Open source models prioritized (can run locally without API costs):
* = open source (run locally) | * = vendor API
OmniDocBench: End-to-end document parsing composite score. OCRBench v2: Overall score across 8 OCR capabilities.
Data from AlphaXiv + Papers With Code.
Models
CoCa (finetuned)
OSSSOTA on ImageNet-1K (91.0%). Combines contrastive and captioning objectives.
ViT-G/14
OSS90.45% top-1 on ImageNet. Giant variant.
ViT-H/14
OSS88.55% top-1 on ImageNet. Huge variant.
ViT-L/16
OSSLarge variant. 82.7% with ImageNet-21k pretraining.
ViT-B/16
OSSBase variant. 81.2% with ImageNet-21k pretraining.
ConvNeXt V2 Huge
OSSMeta
88.9% on ImageNet. Best pure ConvNet.
ConvNeXt V2 Base
OSSMeta
Good balance of speed and accuracy.
ConvNeXt V2 Tiny
OSSMeta
83.0% on ImageNet. Lightweight variant.
Swin Transformer V2 Large
OSSMicrosoft
86.8% on Kinetics-400. Scales to 3B parameters.
Swin Transformer Large
OSSMicrosoft
87.3% on ImageNet-1K.
EfficientNetV2-L
OSS85.7% on ImageNet. Faster training than V1.
EfficientNet-B7
OSS84.4% on ImageNet. 8.4x smaller than GPipe.
EfficientNet-B0
OSS77.1% on ImageNet. Baseline for compound scaling.
DeiT-B Distilled
OSSMeta
85.2% on ImageNet. Trained on ImageNet-1K only.
DeiT-B
OSSMeta
83.1% on ImageNet without external data.
ResNet-152
OSSMicrosoft
78.6% on ImageNet (10-crop). Deep residual network.
ResNet-50
OSSMicrosoft
76-80% on ImageNet depending on training. Standard baseline.
ResNet-50 (A3 training)
OSSTimm
80.4% on ImageNet with modern training recipes.
PaddleOCR-VL
OSSBaidu
#1 on OmniDocBench
PaddleOCR-VL 0.9B
OSSBaidu
Lightweight version
MinerU 2.5
OSSOpenDataLab
#1 on layout detection (97.5 mAP)
Qwen3-VL-235B
OSSAlibaba
Large model, requires significant compute
Gemini 2.5 Pro
API#1 on OCRBench v2 Chinese, MME-VideoOCR
Gemini 2.0 Flash
API#1 on KITAB-Bench (Arabic)
Gemini 1.5 Pro
API#1 on CC-OCR Multi-Scene
Qwen2.5-VL 72B
OSSAlibaba
GPT-4o
APIOpenAI
Best OCR edit distance on OmniDocBench (0.02)
Seed1.6-vision
APIByteDance
#1 on OCRBench v2 English
Chandra v0.1.0
OSSdatalab-to
#1 on olmOCR-Bench (83.1). Best on old scans math, long tiny text, base accuracy.
OCRVerse 4B
OSSUnknown
Strong OmniDocBench performer (88.56)
dots.ocr 3B
OSSUnknown
Best table TEDS among 3B models. Also #1 on olmOCR tables (88.3)
CHURRO (3B)
OSSStanford
#1 on CHURRO-DS (82.3 printed, 70.1 handwritten)
Claude Sonnet 4
APIAnthropic
#1 on ThaiOCRBench
Claude 3.5 Sonnet
APIAnthropic
Lowest hallucination rate on CC-OCR (0.09%)
Tesseract
OSSGoogle (Open Source)
Classic open-source OCR engine
EasyOCR
OSSJaidedAI
80+ languages supported
DeepSeek OCR
OSSDeepSeek
DeepSeek's OCR model for document understanding.
Marker 1.10.0
OSSVikParuchuri
Open-source PDF to Markdown converter.
Marker 1.10.1
OSSVikParuchuri
Latest version of Marker PDF parser.
GPT-4o (Anchored)
OSSOpenAI
GPT-4o with anchored prompting for OCR.
olmOCR v0.3.0
OSSAllen AI
Earlier version of olmOCR.
Mistral OCR 3
APIMistral
Latest Mistral OCR (Dec 2025). 74% win rate vs OCR 2. Claims 94.9% accuracy. Markdown + HTML table output. $1/1000 pages with batch API.
clearOCR
APITeamQuest
Polish OCR service. Text extraction only - no table/formula recognition. Best for simple documents. VERIFIED by CodeSOTA: 84.6% text accuracy, but 0.8% table TEDS due to lack of structure recognition.
Mistral OCR 2
APIMistral
Previous version of Mistral OCR API.
Qwen2-VL 72B
OSSAlibaba
Qwen2's large vision-language model.
Qwen2.5-VL 32B
OSSAlibaba
Qwen2.5 32B vision-language model.
GPT-4o Mini
OSSOpenAI
Smaller, faster version of GPT-4o.
Azure OCR
OSSMicrosoft
Microsoft Azure's OCR service.
PaddleOCR
OSSBaidu
Open-source OCR from PaddlePaddle.
InternVL3 14B
OSSOpenGVLab
InternVL3 14B vision-language model.
Llama 3 70B
OSSMeta
Meta's Llama 3 70B model.
Claude 3.5 Opus
OSSAnthropic
Anthropic's Claude 3.5 Opus model.
AL-Negat
OSSResearch
Adversarial learning for brain network analysis.
GCN
OSSResearch
Standard Graph Convolutional Network baseline.
Multi-Task Transformer
OSSResearch
Transformer-based multi-task learning for brain analysis.
Deep Learning (Heinsfeld)
OSSResearch
Heinsfeld et al. deep learning approach for ABIDE.
PHGCL-DDGFormer
OSSResearch
Graph transformer with dynamic graph learning.
Random Forest
OSSBaseline
Standard Random Forest baseline.
MAACNN
OSSResearch
Multi-scale attention CNN for brain imaging.
Multi-Atlas DNN
OSSResearch
DNN combining multiple brain atlases.
Abraham Connectomes
OSSResearch
Abraham et al. connectome-based approach.
Go-Explore
OSSUber AI
Exploration-based reinforcement learning.
BrainGNN
OSSResearch
ROI-aware graph convolutional layers for interpretable brain network analysis. 73.3% accuracy on ABIDE I.
MVS-GCN
OSSResearch
Handles multi-site variability. 69.38% accuracy on ABIDE dataset.
BrainGT
OSSResearch
78.7% AUC on ABIDE dataset, significantly higher than BrainNetTF (73.2%).
SVM with Connectivity Features
OSSResearch
70.1% accuracy on ABIDE with functional connectivity features. Classic baseline for brain classification.
AE-FCN
OSSResearch
85% accuracy combining fMRI and sMRI on ABIDE (Rakic et al., 2020).
DeepASD
OSSResearch
93% AUC-ROC on ABIDE-II combining fMRI and SNPs data.
MCBERT
OSSResearch
93.4% accuracy on ABIDE-I with leave-one-site-out cross-validation. Uses phenotypic data.
ASD-SWNet
OSSResearch
76.52% accuracy, 80.65% recall, 0.81 AUC on ABIDE dataset.
Agent57
OSSDeepMind
First agent to surpass human performance on all 57 Atari games. Uses a meta-controller to adapt exploration.
MuZero
OSSDeepMind
Learns a model of the environment's dynamics without knowing the rules. Mastered Go, Chess, Shogi, and Atari.
DreamerV3
OSSDeepMind
Scalable world model that masters Atari and Minecraft (MineDojo) with fixed hyperparameters.
Rainbow DQN
OSSDeepMind
Combines 7 improvements to DQN (Double, Dueling, PER, Noisy Nets, Distributional, n-step).
DQN (Human-level)
OSSDeepMind
The breakthrough paper (Nature 2015) that started the Deep RL revolution.
Human Professional
OSSBiology
Average score of a professional human games tester. Normalized to 100%.
Plymouth DL Model
OSSResearch
Up to 98% accuracy on a subset of ABIDE (884 participants). Highlights visual processing regions.
Co-DETR (Swin-L)
OSSResearch
Collaborative Hybrid Assignments Training. SOTA on COCO.
InternImage-H
OSSShanghai AI Lab
Large-scale vision model bridging CNN and Transformer.
DINO (Swin-L)
OSSResearch
End-to-end object detection with transformers.
CheXNet
OSSStanford ML Group
First model to exceed radiologist performance on pneumonia detection. Trained on ChestX-ray14.
TorchXRayVision
OSSCohen Lab
Pre-trained on 8 datasets (MIMIC, CheXpert, NIH, etc.). Unified 18-pathology output.
CheXzero
OSSHarvard/MIT
Zero-shot chest X-ray classification using CLIP. No task-specific training needed.
MedCLIP
OSSResearch
Decoupled contrastive learning on MIMIC-CXR. Semantic matching for medical imaging.
GLoRIA
OSSStanford
Global-Local Representations for Images using Attention. Learns fine-grained image-text alignment.
BioViL
OSSMicrosoft
Biomedical Vision-Language model. Strong performance on phrase grounding.
RAD-DINO
OSSMicrosoft
Self-supervised radiology foundation model. Strong transfer to downstream tasks.
CheXpert AUC Maximizer
OSSStanford
Competition-winning ensemble. 93.0% mean AUC on 5 competition tasks.
DenseNet-121 (Chest X-ray)
OSSResearch
Standard baseline for chest X-ray classification. Pre-trained on ImageNet.
ResNet-50 (Chest X-ray)
OSSResearch
Standard ResNet baseline for radiology.
ConVIRT
OSSNYU
Contrastive VIsual Representation learning from Text. Pioneered medical CLIP-like training.
PatchCore
OSSAmazon
State-of-the-art on MVTec AD. Uses pretrained features with coreset subsampling.
PaDiM
OSSResearch
Patch-wise anomaly detection using pretrained embeddings and Mahalanobis distance.
FastFlow
OSSResearch
2D normalizing flows for fast anomaly detection. Good speed-accuracy tradeoff.
EfficientAD
OSSResearch
614 FPS inference speed. Optimized for production deployment.
SimpleNet
OSSResearch
Simple yet effective. Competitive with complex methods on MVTec.
DRAEM
OSSResearch
Discriminatively trained reconstruction for anomaly detection.
CFLOW-AD
OSSResearch
Real-time unsupervised anomaly detection via conditional normalizing flows.
Reverse Distillation
OSSResearch
Reverse distillation for anomaly detection. Strong on texture classes.
YOLOv8 (Weld Detection)
OSSUltralytics
Fine-tuned YOLOv8 for weld defect detection. Fast inference for production.
DefectDet (ResNet)
OSSResearch
ResNet backbone with FPN for multi-scale defect detection.
Have benchmark results?
Submit your paper or benchmark results. We verify and add them to our database.
Submit PaperGet OCR updates
New models, benchmark results, and practical guides.
No spam. Unsubscribe anytime.
About This Data
All benchmark results are sourced from AlphaXiv benchmark leaderboards. Each data point includes the source URL and access date for verification.
Results marked as "pending verification" are claimed in papers but have not been independently confirmed. We do not include estimated or interpolated values.