| 01 | LlamaParse Agentic SOTA on ParseBench: highest overall score (84.9), best-in-column on Tables (90.7), Charts (78.1), Semantic Formatting (85.2), and Visual Grounding (80.6). Cost ~$0.012/page. ParseBench Table 5. Sub-scores: tables 90.7, charts 78.1, content-faithfulness 89.7, semantic-formatting 85.2, visual-grounding 80.6. | verified | 84.9 | 2026 | Source ↗ | Looks wrong? |
| 02 | LlamaParse Cost Effective LlamaParse in cost-effective mode: competitive with Gemini 3 Flash minimal at ~1/10 the cost. ParseBench Table 5. Sub-scores: tables 73.2, charts 66.7, content-faithfulness 88, semantic-formatting 73, visual-grounding 58.6. | verified | 71.9 | 2026 | Source ↗ | Looks wrong? |
| 03 | Google Gemini 3 Flash Gemini 3 Flash at default high thinking, evaluated as a VLM parser. Strongest VLM overall on ParseBench; 89.9 on Tables (best-in-column). ParseBench Table 5. Sub-scores: tables 89.9, charts 64.8, content-faithfulness 86.2, semantic-formatting 58.4, visual-grounding 56. | verified | 71 | 2026 | Source ↗ | Looks wrong? |
| 04 | Gemini 3 Flash Gemini 3 Flash at default high thinking, evaluated as a VLM parser. Strongest VLM overall on ParseBench; 89.9 on Tables (best-in-column). ParseBench Table 5. Sub-scores: tables 89.9, charts 64.8, content-faithfulness 86.2, semantic-formatting 58.4, visual-grounding 56. | verified | 71 | 2026 | Source ↗ | Looks wrong? |
| 05 | Reducto Reducto (default non-agentic pipeline). Second-best specialised parser overall. ParseBench Table 5. Sub-scores: tables 70.3, charts 57, content-faithfulness 86.4, semantic-formatting 56.8, visual-grounding 68.7. | verified | 67.8 | 2026 | Source ↗ | Looks wrong? |
| 06 | Qwen 3 VL Qwen 3 VL evaluated via a parse-with-layout pipeline. Visual grounding uses a separate layout-only pipeline; 4 pages excluded where that pipeline failed. ParseBench Table 5. Sub-scores: tables 74.7, charts 28.2, content-faithfulness 87.6, semantic-formatting 64.2, visual-grounding 55.2. | verified | 62 | 2026 | Source ↗ | Looks wrong? |
| 07 | Qwen3-VL-4B Qwen 3 VL evaluated via a parse-with-layout pipeline. Visual grounding uses a separate layout-only pipeline; 4 pages excluded where that pipeline failed. ParseBench Table 5. Sub-scores: tables 74.7, charts 28.2, content-faithfulness 87.6, semantic-formatting 64.2, visual-grounding 55.2. | verified | 62 | 2026 | Source ↗ | Looks wrong? |
| 08 | Azure Document Intelligence Azure Document Intelligence (prebuilt layout). Best non-LlamaParse visual grounding (73.8). ParseBench Table 5. Sub-scores: tables 86, charts 1.6, content-faithfulness 84.9, semantic-formatting 51.9, visual-grounding 73.8. | verified | 59.6 | 2026 | Source ↗ | Looks wrong? |
| 09 | Dots OCR 1.5 Dots OCR 1.5: strongest content-faithfulness score in the benchmark (90.0), but charts collapse to 0.9. ParseBench Table 5. Sub-scores: tables 85.2, charts 0.9, content-faithfulness 90, semantic-formatting 47, visual-grounding 55.8. | verified | 55.8 | 2026 | Source ↗ | Looks wrong? |
| 10 | Extend Extend parse pipeline. ParseBench Table 5. Sub-scores: tables 85.1, charts 1.6, content-faithfulness 84.1, semantic-formatting 47.4, visual-grounding 60.7. | verified | 55.8 | 2026 | Source ↗ | Looks wrong? |
| 11 | Docling Docling OSS pipeline. Visual grounding score (66.1) excludes 13 pages where the pipeline failed. ParseBench Table 5. Sub-scores: tables 66.4, charts 52.8, content-faithfulness 66.9, semantic-formatting 1, visual-grounding 66.1. | verified | 50.6 | 2026 | Source ↗ | Looks wrong? |
| 12 | Google Cloud Document AI Google Cloud Document AI (layout parser). ParseBench Table 5. Sub-scores: tables 55.1, charts 1.4, content-faithfulness 83.7, semantic-formatting 50.5, visual-grounding 61.3. | verified | 50.4 | 2026 | Source ↗ | Looks wrong? |
| 13 | AWS Textract AWS Textract via its layout pipeline. Strong on grounding (70.4) but near-zero on charts (6.0) and formatting (3.7). ParseBench Table 5. Sub-scores: tables 84.6, charts 6, content-faithfulness 74.8, semantic-formatting 3.7, visual-grounding 70.4. | verified | 47.9 | 2026 | Source ↗ | Looks wrong? |
| 14 | GPT-5 mini GPT-5 Mini evaluated as a VLM parser on ParseBench with reasoning set to medium. ParseBench Table 5. Sub-scores: tables 69.8, charts 30.1, content-faithfulness 82.3, semantic-formatting 45.8, visual-grounding 6.2. | verified | 46.8 | 2026 | Source ↗ | Looks wrong? |
| 15 | OpenAI GPT-5 Mini GPT-5 Mini evaluated as a VLM parser on ParseBench with reasoning set to medium. ParseBench Table 5. Sub-scores: tables 69.8, charts 30.1, content-faithfulness 82.3, semantic-formatting 45.8, visual-grounding 6.2. | verified | 46.8 | 2026 | Source ↗ | Looks wrong? |
| 16 | Anthropic Haiku 4.5 Claude Haiku 4.5 with extended thinking enabled, evaluated as a VLM parser. ParseBench Table 5. Sub-scores: tables 77.2, charts 13.8, content-faithfulness 78.7, semantic-formatting 49.4, visual-grounding 6.7. | verified | 45.2 | 2026 | Source ↗ | Looks wrong? |
| 17 | LandingAI LandingAI ADE parse pipeline. ParseBench Table 5. Sub-scores: tables 73.7, charts 10.9, content-faithfulness 88.6, semantic-formatting 27.9, visual-grounding 25.1. | verified | 45.2 | 2026 | Source ↗ | Looks wrong? |