Home/OCR/dots.ocr 3B
OPEN SOURCEDec 2025

dots.ocr 3B: Unified Multilingual Document Parser

A 3B parameter open-source model that handles text, tables, and formulas in 100+ languages with a single unified architecture.

100+ Language Support

Unlike most OCR models focused on English/Chinese, dots.ocr delivers SOTA performance on low-resource languages including Tibetan, Kannada, Russian, Arabic, and more.

Apache 2.03B paramsRuns locallyUnified model
88.41
OmniDocBench
Composite
95.2%
Text Accuracy
0.048 edit
86.8%
Table TEDS
Structure
83.2%
Formula CDM
LaTeX

OmniDocBench Comparison

OmniDocBench tests end-to-end document parsing across 1,355 pages with text, tables, and formulas. Composite score: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3

ModelCompositeTextTablesFormulas
PaddleOCR-VL92.86-93.5%-
dots.ocr 3B88.4195.2%86.8%83.2%
Mistral OCR 379.7590.1%70.9%78.2%
clearOCR31.784.6%0.8%~10%

Key Advantages

Unified Architecture

Single 1.7B LLM foundation handles layout detection, text recognition, table parsing, and formula extraction. No multi-model pipelines.

Multilingual SOTA

Best-in-class on low-resource languages. Tested on dots.ocr-bench covering 1,493 images across 100 languages.

Prompt-Based Control

Natural language prompts define tasks: full parsing, layout-only, or targeted region extraction.

Efficient Inference

Faster than larger models while maintaining accuracy. Runs on consumer GPUs.

olmOCR Benchmark

79.1%
Pass Rate
88.3%
Table Recognition
#1 ranking
3B
Parameters

Code Example

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load dots.ocr model
model = AutoModelForCausalLM.from_pretrained(
    "rednote-hilab/dots.ocr-3b",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("rednote-hilab/dots.ocr-3b")

# Parse document
image = Image.open("document.png")
inputs = processor(
    text="<parse_document>",  # Prompt for full parsing
    images=image,
    return_tensors="pt"
)

outputs = model.generate(**inputs, max_new_tokens=4096)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)  # JSON with layout, text, tables, formulas

When to Use dots.ocr

Excellent For
  • Multilingual documents (100+ languages)
  • Low-resource languages (Tibetan, Kannada, etc.)
  • On-premise/private deployment
  • Tables + formulas in one model
  • Cost-sensitive high-volume processing
Considerations
  • Requires GPU for efficient inference
  • 3B model needs ~8GB VRAM
  • PaddleOCR-VL still leads on pure accuracy
  • API solutions may be faster to integrate

Verdict

OmniDocBench
88.41
#2 open source
Table TEDS
86.8%
Strong tables
Languages
100+
Multilingual SOTA

dots.ocr 3B is the best open-source choice for multilingual document parsing. It combines text, table, and formula recognition in a single efficient model.

For English/Chinese-only workloads, PaddleOCR-VL has slightly higher accuracy. For API-based solutions, Mistral OCR 3 is faster to integrate but lower accuracy.

Best use case: Organizations needing to process documents in multiple languages with data privacy requirements.

GitHub: rednote-hilab/dots.ocr
License: Apache 2.0
Model: 3B parameters, runs on consumer GPUs

Related Reading