Computer Vision

Building systems that understand images and video? Find benchmarks for recognition, detection, segmentation, and document analysis tasks.

16 tasks169 datasets1643 results

Computer Vision is one of the most mature areas of applied ML, with production systems processing billions of images daily. The field has evolved from hand-crafted features to deep learning, and now to vision-language models that understand images in context.

State of the Field (Dec 2024)

-Vision Transformers (ViT) have largely replaced CNNs for high-accuracy tasks
-Multimodal models (GPT-4o, Gemini 1.5, Claude 3.5) are changing how we approach OCR and document understanding
-Real-time inference is now possible for most tasks on edge devices
-Self-supervised pretraining (DINOv2, SAM) provides strong foundations without labeled data

Quick Recommendations

Document OCR (clean PDFs)

PaddleOCR or Tesseract 5

Free, fast, accurate enough for 90% of use cases

Document OCR (complex layouts)

Azure Document Intelligence or Google Document AI

Best at tables, forms, and mixed layouts

Handwriting Recognition

Google Cloud Vision or Microsoft Azure

Still the leaders for cursive and messy handwriting

Scene Text (signs, products)

EasyOCR or PaddleOCR

Trained on natural scene images, not just documents

Tasks & Benchmarks

Optical Character Recognition

Extracting text from document images

110 datasets680 resultsSOTA tracked

Scene Text Detection

Detecting text regions in natural scene images

10 datasets465 resultsSOTA tracked

Document Layout Analysis

Analyzing the layout structure of documents

5 datasets126 resultsSOTA tracked

Scene Text Recognition

Recognizing text in natural scene images

11 datasets109 resultsSOTA tracked

Document Image Classification

Classifying documents by type or category

7 datasets54 resultsSOTA tracked

Document Parsing

Parsing document structure and content

2 datasets51 resultsSOTA tracked

General OCR Capabilities

Comprehensive benchmarks covering multiple aspects of OCR performance.

4 datasets50 resultsSOTA tracked

Table Recognition

Detecting and parsing tables in documents

5 datasets38 resultsSOTA tracked

Handwriting Recognition

Recognizing handwritten text

6 datasets38 resultsSOTA tracked

Image Classification

Categorizing images into predefined classes (ImageNet, CIFAR).

4 datasets25 resultsSOTA tracked

Object Detection

Locating and classifying objects in images (COCO, Pascal VOC).

2 datasets5 resultsSOTA tracked

Semantic Segmentation

Pixel-level classification of images (Cityscapes, ADE20K).

2 datasets2 resultsSOTA tracked

Document Understanding

Understanding document content and structure

1 datasets0 results

Key Information Extraction

Extracting key-value pairs from documents

0 datasets0 results

LaTeX OCR

Converting mathematical formulas to LaTeX

0 datasets0 results

Polish OCR

OCR for Polish language including historical documents, gothic fonts, and diacritic recognition.

0 datasets0 results

Show all datasets and SOTA results

Optical Character Recognition

CodeSOTA PolishCodeSOTA Polish OCR Benchmark2025

1,000 synthetic and real Polish text images with 5 degradation levels (clean to severe). Tests character-level OCR on diacritics with contamination-resistant synthetic categories. Categories: synth_random (pure character recognition), synth_words (Markov-generated words), real_corpus (Pan Tadeusz, official documents), wikipedia (potential contamination baseline).

IMPACT-PSNCIMPACT Polish Digital Libraries Ground Truth2012

478 pages of ground truth from four Polish digital libraries at 99.95% accuracy. Includes annotations at region, line, word, and glyph levels. Gothic and antiqua fonts.

KITAB-BenchKITAB Arabic OCR Benchmark2024

SOTA:0.79(cer)

PaddleOCR

8,809 Arabic text samples across 9 domains. Tests Arabic script recognition.

PolEval 2021 OCRPolEval 2021 OCR Post-Correction Task2021

979 Polish books (69,000 pages) from 1791-1998. Focus on OCR post-correction using NLP methods. Major benchmark for Polish historical document processing.

SROIEScanned Receipts OCR and Information Extraction2019

626 receipt images. Key task: extract company, date, address, total from receipts.

ThaiOCRBenchThai OCR Benchmark2024

SOTA:0.84(ted-score)

Claude Sonnet 4

2,808 Thai text samples across 13 tasks. Tests Thai script structural understanding.

aapd2020

SOTA:72.9(f1)

KD-LSTMreg

Dataset from Papers With Code

amazon2020

SOTA:94.31(accuracy)

ApproxRepSet

Dataset from Papers With Code

and-dataset2020

SOTA:0.81(average-f1)

Siamese_MHCA_SA

Dataset from Papers With Code

arxiv-hep-th-citation-graph2020

SOTA:47.15(rouge-1)

DeepPyramidion

Dataset from Papers With Code

arxiv-summarization-dataset2020

SOTA:19.99(rouge-2)

DeepPyramidion

Dataset from Papers With Code

australian2020

SOTA:70.9(accuracy)

ELSC

Dataset from Papers With Code

ba2020

SOTA:51.8(accuracy)

ELSC

Dataset from Papers With Code

bbc-xsum2020

SOTA:47.12(rouge-1)

BigBird-Pegasus

Dataset from Papers With Code

bbcsport2020

SOTA:99.59(accuracy)

MPAD-path

Dataset from Papers With Code

bc82020

SOTA:56.06(evaluation-macro-f1)

BioRex+Directionality

Dataset from Papers With Code

belfort2020

SOTA:28.11(wer)

PyLaia (human transcriptions + random split)

Dataset from Papers With Code

benchmarking-chinese-text-recognition:-datasets,-b2020

SOTA:89.6(accuracy)

DTrOCR 105M

Dataset from Papers With Code

bentham2020

SOTA:1.73(cer)

StackMix+Blots

Dataset from Papers With Code

cedar-signature2020

SOTA:5.7(far)

Siamese_MultiHeadCrossAttention_SoftAttention (Siamese_MHCA_SA)

Dataset from Papers With Code

cl-scisumm2020

SOTA:33.88(rouge-2)

GCN Hybrid

Dataset from Papers With Code

classic2020

SOTA:96.85(accuracy)

REL-RWMD k-NN

Dataset from Papers With Code

clueweb09-b2020

SOTA:31.1(ndcg-20)

XLNet

Dataset from Papers With Code

cnn-/-daily-mail2020

SOTA:48.18(rouge-1)

Scrambled code + broken (alter)

Dataset from Papers With Code

codesearchnet2020

SOTA:15.99(smoothed-bleu-4)

CodeBERT (MLM+RTD)

Dataset from Papers With Code

codesearchnet---go2020

SOTA:26.79(smoothed-bleu-4)

CodeBERT (MLM)

Dataset from Papers With Code

codesearchnet---java2020

SOTA:21.87(smoothed-bleu-4)

CodeTrans-MT-Large

Dataset from Papers With Code

codesearchnet---javascript2020

SOTA:25.61(smoothed-bleu-4)

Transformer

Dataset from Papers With Code

codesearchnet---php2020

SOTA:26.23(smoothed-bleu-4)

CodeTrans-MT-Base

Dataset from Papers With Code

codesearchnet---python2020

SOTA:20.39(smoothed-bleu-4)

CodeTrans-MT-Base

Dataset from Papers With Code

codesearchnet---ruby2020

SOTA:15.26(smoothed-bleu-4)

CodeTrans-MT-Base

Dataset from Papers With Code

cub-200-20112020

SOTA:85.9(top-1-accuracy)

Q-SENN

Dataset from Papers With Code

dareczech2020

SOTA:46.73(p-10)

Query-doc RobeCzech (Roberta-base)

Dataset from Papers With Code

dart2020

SOTA:97.6(factspotter)

FactT5B

Dataset from Papers With Code

digital-peter2020

SOTA:2.5(cer)

StackMix+Blots

Dataset from Papers With Code

dise-2021-dataset2020

SOTA:0.86(percentage-correct)

JDeskew

Dataset from Papers With Code

docred-ie2020

SOTA:60.1(relation-f1)

REXEL

Dataset from Papers With Code

dwie2020

SOTA:0.73(f1)

VaeDiff-DocRE

Dataset from Papers With Code

e2e2020

SOTA:70.8(rouge-l)

HTLM (fine-tuning)

Dataset from Papers With Code

ephoie2020

SOTA:99.21(average-f1)

LayoutLMv3

Dataset from Papers With Code

food-1012020

SOTA:84.41(accuracy)

Bert

Dataset from Papers With Code

fsns---test2020

SOTA:27.54(sequence-error)

STREET

Dataset from Papers With Code

hkr2020

SOTA:3.49(cer)

StackMix+Blots

Dataset from Papers With Code

hoc2020

SOTA:88.1(f1)

BioLinkBERT (large)

Dataset from Papers With Code

howsumm-method2020

SOTA:53.5(rouge-1)

LexRank (query: method + article + steps titles)

Dataset from Papers With Code

howsumm-step2020

SOTA:39.6(rouge-1)

LexRank (query: step title)

Dataset from Papers With Code

hyperpartisan-news-detection2020

SOTA:95.38(accuracy)

ChuLo

Dataset from Papers With Code

i2l-140k2020

SOTA:89.09(bleu)

I2L-NOPOOL

Dataset from Papers With Code

iam(line-level)2020

SOTA:28.6(test-wer)

GFCN

Dataset from Papers With Code

iam-b2020

SOTA:3.77(cer)

StackMix+Blots

Dataset from Papers With Code

iam-d2020

SOTA:3.01(cer)

StackMix+Blots

Dataset from Papers With Code

icdar-20192020

SOTA:96.55(weighted-average-f1-score)

DiT-L (Cascade)

Dataset from Papers With Code

icdar20132020

SOTA:99.4(accuracy)

DTrOCR 105M

Dataset from Papers With Code

icdar20152020

SOTA:93.5(accuracy)

DTrOCR 105M

Dataset from Papers With Code

im2latex-100k2020

SOTA:88.86(bleu)

I2L-STRIPS

Dataset from Papers With Code

imdb-m2020

SOTA:54.8(accuracy)

Document Classification Using Importance of Sentences

Dataset from Papers With Code

inverse-text2020

SOTA:75.8(f-measure-full-lexicon)

DeepSolo (ViTAEv2-S, TextOCR)

Dataset from Papers With Code

iris2020

SOTA:97.7(accuracy)

ELSC

Dataset from Papers With Code

jaffe2020

SOTA:98.6(accuracy)

ELSC

Dataset from Papers With Code

lam(line-level)2020

SOTA:18.5(test-wer)

GFCN

Dataset from Papers With Code

lun2020

SOTA:64.4(accuracy)

ChuLo

Dataset from Papers With Code

mldoc-zero-shot-english-to-chinese2020

SOTA:93.32(accuracy)

XLMft UDA

Dataset from Papers With Code

mldoc-zero-shot-english-to-french2020

SOTA:96.05(accuracy)

XLMft UDA

Dataset from Papers With Code

mldoc-zero-shot-english-to-german2020

SOTA:96.95(accuracy)

XLMft UDA

Dataset from Papers With Code

mldoc-zero-shot-english-to-italian2020

SOTA:76.02(accuracy)

MultiFiT, pseudo

Dataset from Papers With Code

mldoc-zero-shot-english-to-japanese2020

SOTA:69.57(accuracy)

MultiFiT, pseudo

Dataset from Papers With Code

mldoc-zero-shot-english-to-russian2020

SOTA:89.7(accuracy)

XLMft UDA

Dataset from Papers With Code

mldoc-zero-shot-english-to-spanish2020

SOTA:96.8(accuracy)

XLMft UDA

Dataset from Papers With Code

mldoc-zero-shot-german-to-french2020

SOTA:75.45(accuracy)

BiLSTM (Europarl)

Dataset from Papers With Code

mpqa2020

SOTA:89.81(accuracy)

MPAD-path

Dataset from Papers With Code

pendigits2020

SOTA:82.86(nmi)

DnC-SC

Dataset from Papers With Code

pixraw10p2020

SOTA:96(accuracy)

ELSC

Dataset from Papers With Code

re-docred2020

SOTA:0.79(f1)

VaeDiff-DocRE

Dataset from Papers With Code

read-20162020

SOTA:16.5(wer)

HTR-VT(line-level)

Dataset from Papers With Code

read2016(line-level)2020

SOTA:21.1(test-wer)

Span

Dataset from Papers With Code

recipe2020

SOTA:59.06(accuracy)

ApproxRepSet

Dataset from Papers With Code

reuters-215782020

SOTA:97.17(accuracy)

ApproxRepSet

Dataset from Papers With Code

reuters-de-en2020

SOTA:75(accuracy)

BilBOWA

Dataset from Papers With Code

reuters-en-de2020

SOTA:86.5(accuracy)

BilBOWA

Dataset from Papers With Code

reuters-rcv1/rcv2-english-to-german2020

SOTA:92.7(accuracy)

Biinclusion (Euro500kReuters)

Dataset from Papers With Code

reuters-rcv1/rcv2-german-to-english2020

SOTA:84.4(accuracy)

Biinclusion (Euro500kReuters)

Dataset from Papers With Code

rotowire2020

SOTA:55.88(content-selection-f1)

HierarchicalEncoder + NR + IR

Dataset from Papers With Code

saint-gall2020

SOTA:3.65(cer)

StackMix+Blots

Dataset from Papers With Code

scene-text-recognition-benchmarks2020

SOTA:84.9(accuracy)

CCD-ViT-Small

Dataset from Papers With Code

scidocs-(mag)2020

SOTA:82(f1-micro)

SPECTER

Dataset from Papers With Code

scidocs-(mesh)2020

SOTA:88.7(f1-micro)

SciNCL

Dataset from Papers With Code

scut-ctw15002020

SOTA:129.1(fps)

FAST-T-512

Dataset from Papers With Code

simara2020

SOTA:14.79(wer)

DAN

Dataset from Papers With Code

stdw2020

SOTA:0.78(ap)

RetinaNet

Dataset from Papers With Code

sun-rgb-d2020

SOTA:64.4(iou)

IM3D

Dataset from Papers With Code

sut2020

SOTA:86(accuracy)

CNN

Dataset from Papers With Code

tabfact2020

SOTA:93.1(test)

ARTEMIS-DA

Dataset from Papers With Code

textseg2020

SOTA:84.8(iou)

CCD-ViT-Small

Dataset from Papers With Code

textzoom2020

SOTA:21.84(average-psnr-db)

CCD-ViT-Small

Dataset from Papers With Code

tobacco-small-34822020

SOTA:84(accuracy)

Optimized Text CNN

Dataset from Papers With Code

twitter2020

SOTA:72.6(accuracy)

ApproxRepSet

Dataset from Papers With Code

urdudoc2020

SOTA:88.68(recall)

ContourNet [69]

Dataset from Papers With Code

videodb's-ocr-benchmark-public-collection2020

SOTA:76.22(accuracy)

GPT-4o

Dataset from Papers With Code

warppie10p2020

SOTA:53.4(accuracy)

ELSC

Dataset from Papers With Code

webnlg-(all)2020

SOTA:55.6(bleu)

HTLM (fine-tuning)

Dataset from Papers With Code

webnlg-(seen)2020

SOTA:65.4(bleu)

HTLM (fine-tuning)

Dataset from Papers With Code

webnlg-(unseen)2020

SOTA:48.4(bleu)

HTLM (fine-tuning)

Dataset from Papers With Code

wikibio2020

SOTA:56.16(parent)

MBD

Dataset from Papers With Code

wikilingua-(tr->en)2020

SOTA:31.37(rouge-l)

DOCmT5

Dataset from Papers With Code

wikipedia-person-and-animal-dataset2020

SOTA:45.36(rouge)

VTM

Dataset from Papers With Code

wine2020

SOTA:75.8(accuracy)

ELSC

Dataset from Papers With Code

wos-119672020

SOTA:86.07(accuracy)

HDLTex

Dataset from Papers With Code

wos-469852020

SOTA:76.58(accuracy)

HDLTex

Dataset from Papers With Code

wos-57362020

SOTA:91.28(accuracy)

ConvTextTM

Dataset from Papers With Code

yelp-142020

SOTA:69.4(accuracy)

KD-LSTMreg

Dataset from Papers With Code

Scene Text Detection

CTW1500Curved Text in the Wild 15002019

1500 images with curved text annotations. Focus on arbitrary-shaped text.

ICDAR 2015ICDAR 2015 Incidental Scene Text2015

SOTA:93.96(precision)

TextFuseNet (ResNeXt-101)

1000 training + 500 test images captured with wearable cameras. Industry standard for scene text detection.

ICDAR 2019 ArTICDAR 2019 Arbitrary-Shaped Text2019

Text in arbitrary shapes including curved and rotated text. 10,166 images total.

Total-TextTotal-Text2017

SOTA:152.8(fps)

FAST-T-448

Curved text benchmark. 1555 images with polygon annotations.

coco-text2020

SOTA:81.9(1-1-accuracy)

CLIP4STR-L

Dataset from Papers With Code

ic19-art2020

SOTA:86.4(accuracy)

CLIP4STR-L (DataComp-1B)

Dataset from Papers With Code

ic19-rects2020

SOTA:93.36(f-measure)

BDN

Dataset from Papers With Code

icdar-20132020

SOTA:97.4(precision)

CRAFT

Dataset from Papers With Code

icdar-2017-mlt2020

SOTA:84.42(precision)

PMTD*

Dataset from Papers With Code

msra-td5002020

SOTA:137.2(fps)

FAST-T-512

Dataset from Papers With Code

Document Layout Analysis

d4la2020

SOTA:70.72(map)

DoPTA

Dataset from Papers With Code

document-layout-recognition-challenge-mini-dev2020

SOTA:1(table)

fglihai

Dataset from Papers With Code

document-layout-recognition-challenge-test2020

SOTA:0.97(figure)

fglihai

Dataset from Papers With Code

publaynet-val2020

SOTA:0.98(table)

DETR

Dataset from Papers With Code

u-diads-bib2020

SOTA:83.4(class-average-iou)

CV-Group

Dataset from Papers With Code

Scene Text Recognition

cute802020

SOTA:99.7(accuracy)

CLIP4STR-L (DataComp-1B)

Dataset from Papers With Code

host2020

SOTA:82.7(1-1-accuracy)

CLIP4STR-L

Dataset from Papers With Code

ic132020

SOTA:97.8(accuracy)

ABINet-LV+TPS++

Dataset from Papers With Code

icdar-20032020

SOTA:97.1(accuracy)

Yet Another Text Recognizer

Dataset from Papers With Code

iiit5k2020

SOTA:99.6(accuracy)

CLIP4STR-L (DataComp-1B)

Dataset from Papers With Code

msda2020

SOTA:42(accuracy)

MetaSelf-Learning

Dataset from Papers With Code

svt2020

SOTA:99.1(accuracy)

CLIP4STR-H (DFN-5B)

Dataset from Papers With Code

svt-p2020

SOTA:89.6(accuracy)

ABINet-LV+TPS++

Dataset from Papers With Code

svtp2020

SOTA:98.6(accuracy)

DTrOCR 105M

Dataset from Papers With Code

uber-text2020

SOTA:92.2(accuracy)

CLIP4STR-L (DataComp-1B)

Dataset from Papers With Code

wost2020

SOTA:90.9(1-1-accuracy)

CLIP4STR-H (DFN-5B)

Dataset from Papers With Code

Document Image Classification

aip2020

SOTA:83.4(top-1-accuracy-verb)

ResNet-RS (ResNet-200 + RS training tricks)

Dataset from Papers With Code

n-mnist2020

SOTA:97.62(accuracy)

Pixel-level RC

Dataset from Papers With Code

noisy-bangla-characters2020

SOTA:89.54(accuracy)

PCGAN-CHAR

Dataset from Papers With Code

noisy-bangla-numeral2020

SOTA:96.68(accuracy)

PCGAN-CHAR

Dataset from Papers With Code

noisy-mnist2020

SOTA:98.43(accuracy)

PCGAN-CHAR

Dataset from Papers With Code

rvl-cdip2020

SOTA:97.7(accuracy)

EAML

Dataset from Papers With Code

tobacco-34822020

SOTA:95.57(accuracy)

DocXClassifier-L

Dataset from Papers With Code

Document Parsing

OmniDocBenchOmniDocBench v1.52024

SOTA:97.5(layout-map)

MinerU 2.5

981 annotated PDF pages across 9 document categories. Tests end-to-end document parsing including text, tables, and formulas.

olmOCR-BencholmOCR-Bench2024

SOTA:99.9(base)

Chandra v0.1.0

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

General OCR Capabilities

CC-OCRComprehensive Challenge OCR2024

SOTA:83.25(multi-scene-f1)

Gemini 1.5 Pro

Multi-scene text reading, key information extraction, multilingual text, and document parsing benchmark.

MME-VideoOCRMME Video OCR Benchmark2024

SOTA:73.7(total-accuracy)

Gemini 2.5 Pro

1,464 videos with 2,000 QA pairs across 25 tasks. Tests OCR capabilities in video content.

OCRBench v2OCRBench v22024

SOTA:62.2(overall-zh-private)

Gemini 2.5 Pro

Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.

reVISIONreVISION Polish Vision-Language Benchmark2025

Polish benchmark for vision-language models including OCR evaluation on educational exam materials. Covers middle school, high school, and professional exams.

Table Recognition

icdar2013-table-structure-recognition2020

SOTA:95.46(f-measure)

Proposed System (With post- processing)

Dataset from Papers With Code

pubtabnet2020

SOTA:97.88(teds-struct)

Multi-Task Learning Model

Dataset from Papers With Code

table-recognition-challenge-mini-test2020

SOTA:98.35(teds-simple-samples)

Re0

Dataset from Papers With Code

table-recognition-challenge-test2020

SOTA:91.87(teds-simple-samples)

EDD

Dataset from Papers With Code

wtw2020

SOTA:78.9(f1)

StrucTexTv2 (small)

Dataset from Papers With Code

Handwriting Recognition

CHURRO-DSCultural Heritage Understanding Research Repository OCR Dataset2024

SOTA:82.3(printed-levenshtein)

CHURRO (3B)

Historical documents from 46 languages, 99K pages. Tests handwritten and printed text recognition across diverse scripts.

IAMIAM Handwriting Database1999

SOTA:23.2(wer)

Start, Follow, Read

13,353 handwritten text lines from 657 writers. Standard handwriting benchmark.

Polish EMNIST ExtensionEMNIST Extended with Polish Diacritics2020

Extension of EMNIST dataset with Polish handwritten characters including diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż). Tests recognition of Polish-specific characters.

an-extensive-dataset-of-handwritten-central-kurdis2020

SOTA:97(1-1-accuracy)

KHCR

Dataset from Papers With Code

banglalekha-isolated-dataset2020

SOTA:96.8(accuracy)

AKHCRNet

Dataset from Papers With Code

kohtd2020

SOTA:8.36(cer)

Bluche

Dataset from Papers With Code

Image Classification

CIFAR-10Canadian Institute for Advanced Research 102009

SOTA:99.1(accuracy)

DeiT-B Distilled

60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.

CIFAR-100Canadian Institute for Advanced Research 1002009

SOTA:94.55(accuracy)

ViT-H/14

60K 32x32 color images in 100 fine-grained classes grouped into 20 superclasses. More challenging than CIFAR-10.

ImageNet-1KImageNet Large Scale Visual Recognition Challenge 20122012

SOTA:91(top-1-accuracy)

CoCa (finetuned)

1.28M training images, 50K validation images across 1,000 object classes. The standard benchmark for image classification since 2012.

ImageNet-V2ImageNet-V2 Matched Frequency2019

SOTA:84(top-1-accuracy)

Swin Transformer V2 Large

10K new test images following ImageNet collection process. Tests model generalization beyond the original test set.

Object Detection

COCOMicrosoft COCO: Common Objects in Context2014

SOTA:66(mAP)

Co-DETR (Swin-L)

330K images, 1.5 million object instances, 80 object categories. Standard benchmark for object detection and segmentation.

Pascal VOC 2012Pascal Visual Object Classes Challenge 20122012

11,530 images with 27,450 ROI annotated objects and 6,929 segmentations. Classic object detection benchmark.

Semantic Segmentation

ADE20KADE20K Scene Parsing Benchmark2016

SOTA:62.9(mIoU)

InternImage-H

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

CityscapesCityscapes Dataset2016

5,000 images with fine annotations and 20,000 with coarse annotations of urban street scenes.

Document Understanding

FUNSDForm Understanding in Noisy Scanned Documents2019

199 fully annotated forms. Tests semantic entity labeling and linking.

Key Information Extraction

No datasets indexed yet. Contribute on GitHub

LaTeX OCR

No datasets indexed yet. Contribute on GitHub

Polish OCR

No datasets indexed yet. Contribute on GitHub

Honest Takes

OCR is solved for clean documents

For printed text on white backgrounds, accuracy differences between models are negligible. The real challenge is messy real-world documents, handwriting, and multi-language support.

Benchmarks don't predict production performance

A model scoring 95% on ICDAR may fail on your specific invoice format. Always test on your own data before committing.

Vision LLMs are overkill for most tasks

GPT-4o is impressive but costs 100x more than specialized models. Use it for complex reasoning, not simple extraction.

Need help choosing?

We can run these benchmarks on your actual documents. Same methodology, your data.

Get Private Evaluation

State of the Field (Dec 2024)

Quick Recommendations

Tasks & Benchmarks

Optical Character Recognition

Scene Text Detection

Document Layout Analysis

Scene Text Recognition

Document Image Classification

Document Parsing

General OCR Capabilities

Table Recognition

Handwriting Recognition

Image Classification

Object Detection

Semantic Segmentation

Document Understanding

Key Information Extraction

LaTeX OCR

Polish OCR

Optical Character Recognition

Scene Text Detection

Document Layout Analysis

Scene Text Recognition

Document Image Classification

Document Parsing

General OCR Capabilities

Table Recognition

Handwriting Recognition

Image Classification

Object Detection

Semantic Segmentation

Document Understanding

Key Information Extraction

LaTeX OCR

Polish OCR

Honest Takes

OCR is solved for clean documents

Benchmarks don't predict production performance

Vision LLMs are overkill for most tasks

In-Depth Guides

PaddleOCR vs Tesseract: Complete Comparison

GPT-4o vs Traditional OCR

Best OCR for Invoices

Need help choosing?