olmOCR-bench is an evaluation dataset of 1,403 PDF files designed to test how well Optical Character Recognition (OCR) systems can convert PDFs into clean markdown, especially preserving complex structures like tables, equations, and natural reading order. It's used in conjunction with the olmOCR toolkit, an open-source tool for accurate PDF-to-text conversion that uses a vision language model.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.