Codesota · OCR · Mistral OCRHome/OCR/Mistral OCR
API service · Production ready

Mistral OCR.

Processing thousands of pages with equations or multilingual text? Need fast, cheap OCR via API without managing infrastructure? Mistral OCR outputs Markdown at $0.001/page.

API ServicePaidProduction Ready
§ 01 · Verified

We tested it.

Tested on December 17, 2025: 9-page PDF processed in 9.04 seconds, 34,656 chars output. Download output

§ 02 · Headline numbers

The four metrics that matter.

9.04s
Our Test (9 pages)
34.6K
Chars Output
$0.001
Per Page
50MB
Max File Size
§ 03 · Benchmark claims

Mistral's self-reported numbers.

CategoryMistral OCRGPT-5.4Google Doc AIAzure OCR
Overall94.9%~85%83.4%89.5%
Scanned Docs98.96%~95%96.15%~94%
Math/Equations94.29%~88%~75%~70%
Multilingual89.55%86.0%~82%87.52%

Source: Mistral's internal benchmarks. Independent verification pending.

§ 04 · Independent results

Mixed results in the wild.

  • Koncile.ai: 98.75% transcription accuracy but 27.5% missing data in structured extraction
  • Docsumo: "Even with moderately clean documents, it often missed key data blocks or misinterpreted layout structures"
  • Parsio.io: Fast and cheap, but less robust than enterprise solutions for complex layouts

We recommend testing on your specific document types before production deployment.

Stop picking the wrong OCR model

Monthly OCR benchmark update — new models, price changes, accuracy deltas. Free.

§ 05 · Quick start

Three steps to first parse.

1. Install SDK

pip install mistralai

2. Set API Key

export MISTRAL_API_KEY="your-api-key"

Get your key at console.mistral.ai

3. Process Document

from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Process a PDF from URL
ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": "https://arxiv.org/pdf/2201.04234"
    }
)
# Get markdown output
for page in ocr_response.pages:
    print(page.markdown)
§ 06 · Local files

Process local files.

import base64

def encode_file(file_path):
    with open(file_path, "rb") as f:
        return base64.b64encode(f.read()).decode('utf-8')
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Process local PDF
base64_pdf = encode_file("invoice.pdf")
ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": f"data:application/pdf;base64,{base64_pdf}"
    }
)
§ 07 · Pricing

Pricing comparison.

ServiceCost per 1000 PagesType
Mistral OCR$1.00API
Mistral OCR (batch)$0.50API
GPT-5.4 Vision~$5-15API
Google Document AI$1.50API
Docling$0 (self-hosted)Open Source
§ 08 · Real test

Mistral OCR vs Docling.

Same document: Docling paper (arxiv:2408.09869), tested December 17, 2025

MetricMistral OCRDocling
Processing Time9.04 seconds34.95 seconds
Output Size34,656 chars33,201 chars
Pages Processed9 pages10 pages
Cost (this test)~$0.009$0.00
Data PrivacySent to MistralFully local
Table ExportMarkdown onlyDataFrame/CSV
LicenseProprietary APIMIT (open source)

Mistral is ~4x faster but costs money. Docling is free but requires local compute. Download test data

§ 09 · When to use

Fit for purpose.

Good For
  • High-volume processing (2000+ pages)
  • Scientific papers with equations
  • Multilingual documents
  • Quick prototyping (no setup)
  • When data privacy isn't critical
Consider Alternatives For
  • Sensitive documents (use Docling)
  • Complex structured extraction
  • Low-budget high-volume processing
  • Air-gapped environments
  • Custom model fine-tuning needs
§ 10 · Resources

Continue reading.

Official Documentation
API reference and examples
Announcement
Original launch post with benchmarks
Docling Guide
Open-source alternative
GPT-4o vs PaddleOCR
Vision LLM comparison

Stop picking the wrong OCR model

Monthly OCR benchmark update — new models, price changes, accuracy deltas. Free.

← OCR OverviewDocling Guide →