Codesota · Computer Vision · Document Parsing · olmOCR-BenchTasks/Computer Vision/Document Parsing
Document Parsing · benchmark dataset · 2024 · EN

olmOCR-Bench.

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

56 results indexed across 9 metrics. Shaded row marks current SOTA; ties broken by submission date.


Primary
pass-rate · higher is better
All metrics
arxiv, base, headers-footers, long-tiny-text, multi-column, old-scans, old-scans-math, pass-rate, tables
arxiv
5 rows
#ModelOrgSubmittedPaper / codearxiv
01LightOnOCR-2-1BOSSLightOnJan 2026paper89.60
02Marker 1.10.0OSSVikParuchuriDec 2025github-readme83.80
03olmOCR v0.4.0OSSAllen AIOct 2025paper83
04Chandra v0.1.0OSSdatalab-toDec 2025github-readme82.20
05Qianfan-OCROSSBaidu QianfanMar 2026paper80.10
base
4 rows
#ModelOrgSubmittedPaper / codebase
01Chandra v0.1.0OSSdatalab-toDec 2025github-readme99.90
02olmOCR v0.4.0OSSAllen AIOct 2025paper99.70
03Qianfan-OCROSSBaidu QianfanMar 2026paper99.60
04LightOnOCR-2-1BOSSLightOnJan 2026paper99.60
headers-footers
4 rows
#ModelOrgSubmittedPaper / codeheaders-footers
01olmOCR v0.4.0OSSAllen AIOct 2025paper96.10
02olmOCR v0.3.0OSSAllen AIDec 2025github-readme95.10
03Chandra v0.1.0OSSdatalab-toDec 2025github-readme90.80
04Qianfan-OCROSSBaidu QianfanMar 2026paper42
long-tiny-text
4 rows
#ModelOrgSubmittedPaper / codelong-tiny-text
01Chandra v0.1.0OSSdatalab-toDec 2025github-readme92.30
02LightOnOCR-2-1BOSSLightOnJan 2026paper91.40
03olmOCR v0.4.0OSSAllen AIOct 2025paper81.90
04Qianfan-OCROSSBaidu QianfanMar 2026paper80.40
multi-column
4 rows
#ModelOrgSubmittedPaper / codemulti-column
01Qianfan-OCROSSBaidu QianfanMar 2026paper92.20
02LightOnOCR-2-1BOSSLightOnJan 2026paper84.80
03olmOCR v0.4.0OSSAllen AIOct 2025paper83.70
04Chandra v0.1.0OSSdatalab-toDec 2025github-readme81.20
old-scans
5 rows
#ModelOrgSubmittedPaper / codeold-scans
01Qianfan-OCROSSBaidu QianfanMar 2026paper73.10
02Chandra v0.1.0OSSdatalab-toDec 2025github-readme50.40
03olmOCR v0.4.0OSSAllen AIOct 2025paper47.70
04LightOnOCR-2-1BOSSLightOnJan 2026paper42.20
05GPT-4oAPIOpenAIDec 2025github-readme40.70
old-scans-math
4 rows
#ModelOrgSubmittedPaper / codeold-scans-math
01LightOnOCR-2-1BOSSLightOnJan 2026paper85.60
02olmOCR v0.4.0OSSAllen AIOct 2025paper82.30
03Chandra v0.1.0OSSdatalab-toDec 2025github-readme80.30
04olmOCR v0.3.0OSSAllen AIDec 2025github-readme79.90
pass-rate· primary
21 rows
#ModelOrgSubmittedPaper / codepass-rate
01dots.mocrOSSRedNoteMar 2026github-readme83.90
02LightOnOCR-2-1BOSSLightOnJan 2026paper83.20
03Chandra v0.1.0OSSdatalab-toDec 2025alphaxiv-leaderboard83.10
04Infinity-Parser 7BOSSDec 2025alphaxiv-leaderboard82.50
05olmOCR v0.4.0OSSAllen AIDec 2025alphaxiv-leaderboard82.40
06PaddleOCR-VLOSSBaiduDec 2025alphaxiv-leaderboard80
07Qianfan-OCROSSBaidu QianfanMar 2026paper79.80
08Qwen3-VL-4BOSSAlibaba QwenMar 2026paper79.20
09PaddleOCR-VL-1.5OSSBaidu PaddlePaddleMar 2026paper79.10
10dots.ocr 3BOSSRedNote HILabDec 2025github-readme79.10
11Mistral OCR 3APIMistralDec 2025mistral-announcement78
12Marker 1.10.0OSSVikParuchuriDec 2025github-readme76.50
13Marker 1.10.1OSSVikParuchuriDec 2025alphaxiv-leaderboard76.10
14MonkeyOCR-pro-3BOSSJun 2025paper75.80
15DeepSeek-OCROSSDeepSeekDec 2025alphaxiv-leaderboard75.70
16DeepSeek-OCROSSDeepSeekDec 2025github-readme75.40
17MinerU 2.5OSSOpenDataLabDec 2025alphaxiv-leaderboard75.20
18Mistral OCR 2APIMistralDec 2025alphaxiv-leaderboard72
19GPT-4o (Anchored)OpenAIDec 2025github-readme69.90
20Nanonets OCR2 3BNanonetsDec 2025alphaxiv-leaderboard69.50
21Gemini Flash 2GoogleDec 2025github-readme63.80
tables
5 rows
#ModelOrgSubmittedPaper / codetables
01LightOnOCR-2-1BOSSLightOnJan 2026paper89
02dots.ocr 3BOSSRedNote HILabDec 2025github-readme88.30
03Chandra v0.1.0OSSdatalab-toDec 2025github-readme88
04olmOCR v0.4.0OSSAllen AIOct 2025paper84.90
05Qianfan-OCROSSBaidu QianfanMar 2026paper81.60
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

4 steps
of state of the art.

Each row below marks a model that broke the previous record on pass-rate. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · pass-rate
  1. Jun 5, 2025MonkeyOCR-pro-3B75.80
  2. Dec 16, 2025Chandra v0.1.0datalab-to83.10
  3. Jan 20, 2026LightOnOCR-2-1BLightOn83.20
  4. Mar 19, 2026dots.mocrRedNote83.90
Fig 3 · SOTA-setting models only. 4 entries span Jun 2025 Mar 2026.
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies