Document Parsing
Converting documents (like PDFs) into structured formats (Markdown/HTML).
Benchmarks & Datasets
OmniDocBench
981 annotated PDF pages across 9 document categories. Tests end-to-end document parsing including text, tables, and formulas.
981
Images
2024
Year
See Leaderboard
olmOCR-Bench
7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.
1.4K
Images
2024
Year
See Leaderboard