Home / OCR / Mistral OCR 3
VERIFIED BY CODESOTA Dec 2025

Mistral OCR 3: Independent Benchmark Results

We ran the full OmniDocBench benchmark (1355 images) ourselves. Here's what we found.

Independently Verified Benchmark

CodeSOTA ran the full OmniDocBench evaluation suite on December 19, 2025. We processed all 1,355 images through Mistral's OCR API and computed metrics using the official evaluation tools.

Full dataset Official eval tools Reproducible $2.71 total cost
90.1%
Text Accuracy
Verified
70.9%
Table TEDS
Verified
78.2%
Formula Accuracy
Verified
91.6%
Reading Order
Verified

OmniDocBench Results (Verified)

OmniDocBench is a comprehensive document parsing benchmark with 1,355 pages across 9 document types. The composite score formula: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3

Metric Mistral OCR 3 GPT-4o PaddleOCR-VL
Composite Score 79.75 verified ~85 92.86
Text Edit Distance 0.099 verified 0.02 0.03
Table TEDS 70.9% verified - 93.5%
Table Structure TEDS 75.3% verified - -
Formula (Edit Distance) 0.218 verified - -
Reading Order 91.6% verified - -

Lower is better for Edit Distance metrics. CodeSOTA verification date: December 19, 2025.

Performance by Document Type

Mistral OCR 3 performs best on academic papers and exam papers, struggles with newspapers:

Document Type Text Accuracy Table TEDS
Academic Literature 97.9% 83.0%
Exam Papers 92.8% 88.0%
Books 93.9% 82.7%
Research Reports 95.8% 82.0%
Magazines 97.9% 71.0%
PPT Slides 95.7% 72.6%
Newspapers 67.0% 58.3%

Performance by Language

English
94.6%

Text accuracy

Chinese
86.1%

Text accuracy

Mixed
86.2%

Text accuracy

Pricing

Standard API
$2/1000 pages

Real-time processing via API

Batch API
$1/1000 pages

50% discount for async processing

Our full benchmark run cost $2.71 for 1,355 images using the standard API.

Code Example

from mistralai import Mistral
import base64

client = Mistral(api_key="your-api-key")

# Load document
with open("document.pdf", "rb") as f:
    doc_data = base64.b64encode(f.read()).decode()

# OCR with Mistral OCR 3
response = client.ocr.process(
    model="mistral-ocr-2512",
    document={"type": "pdf", "data": doc_data}
)

# Output is markdown with HTML tables
print(response.content)

When to Use Mistral OCR 3

Excellent For
  • Academic papers (97.9% accuracy)
  • Exam papers (92.8% + 88% tables)
  • Research reports and books
  • Cost-sensitive high-volume OCR
  • English text extraction
Weak Points
  • Newspapers (67% - avoid)
  • Complex multi-column layouts
  • Chinese text (86% vs 94% English)
  • Table recognition vs PaddleOCR

Verdict

Composite Score: 79.75 - Mid-tier performance on OmniDocBench.

Mistral OCR 3 sits between traditional OCR and top VLMs. It's significantly behind PaddleOCR-VL (92.86) and GPT-4o (~85), but at $1-2/1000 pages, it's one of the cheapest options.

Best use case: High-volume academic/research document processing where cost matters more than absolute accuracy.

Model ID: mistral-ocr-2512
API: docs.mistral.ai
Verified: December 19, 2025 by CodeSOTA