dots.ocr 3B: Unified Multilingual Document Parser
A 3B parameter open-source model that handles text, tables, and formulas in 100+ languages with a single unified architecture.
100+ Language Support
Unlike most OCR models focused on English/Chinese, dots.ocr delivers SOTA performance on low-resource languages including Tibetan, Kannada, Russian, Arabic, and more.
OmniDocBench Comparison
OmniDocBench tests end-to-end document parsing across 1,355 pages with text, tables, and formulas. Composite score: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3
| Model | Composite | Text | Tables | Formulas |
|---|---|---|---|---|
| PaddleOCR-VL | 92.86 | - | 93.5% | - |
| dots.ocr 3B | 88.41 | 95.2% | 86.8% | 83.2% |
| Mistral OCR 3 | 79.75 | 90.1% | 70.9% | 78.2% |
| clearOCR | 31.7 | 84.6% | 0.8% | ~10% |
Key Advantages
Single 1.7B LLM foundation handles layout detection, text recognition, table parsing, and formula extraction. No multi-model pipelines.
Best-in-class on low-resource languages. Tested on dots.ocr-bench covering 1,493 images across 100 languages.
Natural language prompts define tasks: full parsing, layout-only, or targeted region extraction.
Faster than larger models while maintaining accuracy. Runs on consumer GPUs.
olmOCR Benchmark
Code Example
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
# Load dots.ocr model
model = AutoModelForCausalLM.from_pretrained(
"rednote-hilab/dots.ocr-3b",
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained("rednote-hilab/dots.ocr-3b")
# Parse document
image = Image.open("document.png")
inputs = processor(
text="<parse_document>", # Prompt for full parsing
images=image,
return_tensors="pt"
)
outputs = model.generate(**inputs, max_new_tokens=4096)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result) # JSON with layout, text, tables, formulasWhen to Use dots.ocr
- Multilingual documents (100+ languages)
- Low-resource languages (Tibetan, Kannada, etc.)
- On-premise/private deployment
- Tables + formulas in one model
- Cost-sensitive high-volume processing
- Requires GPU for efficient inference
- 3B model needs ~8GB VRAM
- PaddleOCR-VL still leads on pure accuracy
- API solutions may be faster to integrate
Verdict
dots.ocr 3B is the best open-source choice for multilingual document parsing. It combines text, table, and formula recognition in a single efficient model.
For English/Chinese-only workloads, PaddleOCR-VL has slightly higher accuracy. For API-based solutions, Mistral OCR 3 is faster to integrate but lower accuracy.
Best use case: Organizations needing to process documents in multiple languages with data privacy requirements.
GitHub: rednote-hilab/dots.ocr
License: Apache 2.0
Model: 3B parameters, runs on consumer GPUs