PDF content extraction benchmark with 7,010 unit tests across 1,402 PDF documents.
View on AlphaXiv ↗Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | chandra-ocr-0.1.0 | 99.9 | codesota-api |
| 2 | olmocr-v0.4.0 | 99.7 | codesota-api |
| 3 | LightOnOCR-2-1B | 99.6 | codesota-api |
| 4 | Qianfan-OCR | 99.6 | codesota-api |
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | olmocr-v0.4.0 | 96.1 | codesota-api |
| 2 | olmocr-v0.3.0 | 95.1 | codesota-api |
| 3 | chandra-ocr-0.1.0 | 90.8 | codesota-api |
| 4 | Qianfan-OCR | 42 | codesota-api |
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | chandra-ocr-0.1.0 | 92.3 | codesota-api |
| 2 | LightOnOCR-2-1B | 91.4 | codesota-api |
| 3 | olmocr-v0.4.0 | 81.9 | codesota-api |
| 4 | Qianfan-OCR | 80.4 | codesota-api |
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | Qianfan-OCR | 92.2 | codesota-api |
| 2 | LightOnOCR-2-1B | 84.8 | codesota-api |
| 3 | olmocr-v0.4.0 | 83.7 | codesota-api |
| 4 | chandra-ocr-0.1.0 | 81.2 | codesota-api |
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | LightOnOCR-2-1B | 89.6 | codesota-api |
| 2 | marker-1.10.0 | 83.8 | codesota-api |
| 3 | olmocr-v0.4.0 | 83 | codesota-api |
| 4 | chandra-ocr-0.1.0 | 82.2 | codesota-api |
| 5 | Qianfan-OCR | 80.1 | codesota-api |
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | LightOnOCR-2-1B | 89 | codesota-api |
| 2 | dots-ocr-3b | 88.3 | codesota-api |
| 3 | chandra-ocr-0.1.0 | 88 | codesota-api |
| 4 | olmocr-v0.4.0 | 84.9 | codesota-api |
| 5 | Qianfan-OCR | 81.6 | codesota-api |
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | LightOnOCR-2-1B | 85.6 | codesota-api |
| 2 | olmocr-v0.4.0 | 82.3 | codesota-api |
| 3 | chandra-ocr-0.1.0 | 80.3 | codesota-api |
| 4 | olmocr-v0.3.0 | 79.9 | codesota-api |
Percentage of unit tests passed
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | dots.mocr | 83.9% | codesota-api |
| 2 | LightOnOCR-2-1B | 83.2% | codesota-api |
| 3 | chandra-ocr-0.1.0 | 83.1% | codesota-api |
| 4 | infinity-parser-7b | 82.5% | codesota-api |
| 5 | olmocr-v0.4.0 | 82.4% | codesota-api |
| 6 | paddleocr-vl | 80% | codesota-api |
| 7 | Qianfan-OCR | 79.8% | codesota-api |
| 8 | Qwen3-VL-4B | 79.2% | codesota-api |
| 9 | PaddleOCR-VL-1.5 | 79.1% | codesota-api |
| 10 | dots-ocr-3b | 79.1% | codesota-api |
| 11 | mistral-ocr-3 | 78% | codesota-api |
| 12 | marker-1.10.0 | 76.5% | codesota-api |
| 13 | marker-1.10.1 | 76.1% | codesota-api |
| 14 | MonkeyOCR-pro-3B | 75.8% | codesota-api |
| 15 | deepseek-ocr | 75.7% | codesota-api |
| 16 | mineru-2.5 | 75.2% | codesota-api |
| 17 | mistral-ocr-api | 72% | codesota-api |
| 18 | gpt-4o-anchored | 69.9% | codesota-api |
| 19 | nanonets-ocr2-3b | 69.5% | codesota-api |
| 20 | gemini-flash-2 | 63.8% | codesota-api |
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | Qianfan-OCR | 73.1 | codesota-api |
| 2 | chandra-ocr-0.1.0 | 50.4 | codesota-api |
| 3 | olmocr-v0.4.0 | 47.7 | codesota-api |
| 4 | LightOnOCR-2-1B | 42.2 | codesota-api |
| 5 | gpt-4o | 40.7 | codesota-api |