Document Parsing
Parsing document structure and content
Document Parsing is a key task in computer vision. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.
Benchmarks & SOTA
olmOCR-Bench
olmOCR-Bench
7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.
State of the Art
Chandra v0.1.0
datalab-to
99.9
base
OmniDocBench
OmniDocBench v1.5
981 annotated PDF pages across 9 document categories. Tests end-to-end document parsing including text, tables, and formulas.
State of the Art
MinerU 2.5
OpenDataLab
97.5
layout-map
Related Tasks
General OCR Capabilities
Comprehensive benchmarks covering multiple aspects of OCR performance.
Polish OCR
OCR for Polish language including historical documents, gothic fonts, and diacritic recognition.
Image Classification
Categorizing images into predefined classes (ImageNet, CIFAR).
Object Detection
Locating and classifying objects in images (COCO, Pascal VOC).