Computer Vision

Document Parsing

Parsing document structure and content

2 datasets51 results

Document Parsing is a key task in computer vision. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.

Benchmarks & SOTA

olmOCR-Bench

202428 results

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

State of the Art

Chandra v0.1.0

datalab-to

99.9

base

OmniDocBench

OmniDocBench v1.5

202423 results

981 annotated PDF pages across 9 document categories. Tests end-to-end document parsing including text, tables, and formulas.

State of the Art

MinerU 2.5

OpenDataLab

97.5

layout-map

Related Tasks

General OCR Capabilities

Comprehensive benchmarks covering multiple aspects of OCR performance.

Polish OCR

OCR for Polish language including historical documents, gothic fonts, and diacritic recognition.

Image Classification

Categorizing images into predefined classes (ImageNet, CIFAR).

Object Detection

Locating and classifying objects in images (COCO, Pascal VOC).

Back to Computer Vision