Home/OCR/dots.ocr 3B

OPEN SOURCEDec 2025

dots.ocr 3B: Unified Multilingual Document Parser

A 3B parameter open-source model that handles text, tables, and formulas in 100+ languages with a single unified architecture.

100+ Language Support

Unlike most OCR models focused on English/Chinese, dots.ocr delivers SOTA performance on low-resource languages including Tibetan, Kannada, Russian, Arabic, and more.

Apache 2.03B paramsRuns locallyUnified model

88.41

OmniDocBench

Composite

95.2%

Text Accuracy

0.048 edit

86.8%

Table TEDS

Structure

83.2%

Formula CDM

LaTeX

OmniDocBench Comparison

OmniDocBench tests end-to-end document parsing across 1,355 pages with text, tables, and formulas. Composite score: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3

Model	Composite	Text	Tables	Formulas
PaddleOCR-VL	92.86	-	93.5%	-
dots.ocr 3B	88.41	95.2%	86.8%	83.2%
Mistral OCR 3	79.75	90.1%	70.9%	78.2%
clearOCR	31.7	84.6%	0.8%	~10%

Key Advantages

Unified Architecture

Single 1.7B LLM foundation handles layout detection, text recognition, table parsing, and formula extraction. No multi-model pipelines.

Multilingual SOTA

Best-in-class on low-resource languages. Tested on dots.ocr-bench covering 1,493 images across 100 languages.

Prompt-Based Control

Natural language prompts define tasks: full parsing, layout-only, or targeted region extraction.

Efficient Inference

Faster than larger models while maintaining accuracy. Runs on consumer GPUs.

olmOCR Benchmark

79.1%

Pass Rate

88.3%

Table Recognition

#1 ranking

Parameters

Code Example

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load dots.ocr model
model = AutoModelForCausalLM.from_pretrained(
    "rednote-hilab/dots.ocr-3b",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("rednote-hilab/dots.ocr-3b")

# Parse document
image = Image.open("document.png")
inputs = processor(
    text="<parse_document>",  # Prompt for full parsing
    images=image,
    return_tensors="pt"
)

outputs = model.generate(**inputs, max_new_tokens=4096)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)  # JSON with layout, text, tables, formulas

When to Use dots.ocr

Excellent For

Multilingual documents (100+ languages)
Low-resource languages (Tibetan, Kannada, etc.)
On-premise/private deployment
Tables + formulas in one model
Cost-sensitive high-volume processing

Considerations

Requires GPU for efficient inference
3B model needs ~8GB VRAM
PaddleOCR-VL still leads on pure accuracy
API solutions may be faster to integrate

Verdict

OmniDocBench

88.41

#2 open source

Table TEDS

86.8%

Strong tables

Languages

100+

Multilingual SOTA

dots.ocr 3B is the best open-source choice for multilingual document parsing. It combines text, table, and formula recognition in a single efficient model.

For English/Chinese-only workloads, PaddleOCR-VL has slightly higher accuracy. For API-based solutions, Mistral OCR 3 is faster to integrate but lower accuracy.

Best use case: Organizations needing to process documents in multiple languages with data privacy requirements.

GitHub: rednote-hilab/dots.ocr
License: Apache 2.0
Model: 3B parameters, runs on consumer GPUs