Codesota · Models · PaddleOCR-VL-1.5Baidu PaddlePaddle1 results · 1 benchmarks
Model card

PaddleOCR-VL-1.5.

Baidu PaddlePaddleopen-source0.9B paramsMulti-Task VLM (0.9B params)Apache 2.0

Next-gen of PaddleOCR-VL with 0.9B params. SOTA on OmniDocBench v1.5 (94.5%).

§ 01 · Card

Model card,
inline.

Rendered server-side from the upstream README on Hugging Face — same content as the source repo, with editorial typography. The full card, sample weights, and revision history live on HF.


Source
PaddlePaddle/PaddleOCR-VL-1.5
License
apache-2.0
Pipeline
image-text-to-text

<div align="center">

<h1 align="center">

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

</h1>

![repo](https://github.com/PaddlePaddle/PaddleOCR) ![HuggingFace](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5) ![ModelScope](https://modelscope.cn/models/PaddlePaddle/PaddleOCR-VL-1.5) ![HuggingFace](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL-1.5OnlineDemo) ![ModelScope](https://modelscope.cn/studios/PaddlePaddle/PaddleOCR-VL-1.5OnlineDemo/summary) ![Discord](https://discord.gg/JPmZXDsEEK) ![X](https://x.com/PaddlePaddle) ![License](./LICENSE)

🔥 [Official Website](https://www.paddleocr.com) 📝 [Technical Report](https://arxiv.org/abs/2601.21957)

</div>

<div align="center"> <img src="https://raw.githubusercontent.com/cuicheng01/PaddleXdocimages/refs/heads/main/images/paddleocrvl15/paddleocr-vl-1.5metrics.png" width="800"/> </div>

Introduction

PaddleOCR-VL-1.5 is an advanced next-generation model of PaddleOCR-VL, achieving a new state-of-the-art accuracy of 94.5% on OmniDocBench v1.5. To rigorously evaluate robustness against real-world physical distortions—including scanning artifacts, skew, warping, screen photography, and illumination—we propose the Real5-OmniDocBench benchmark. Experimental results demonstrate that this enhanced model attains SOTA performance on the newly curated benchmark. Furthermore, we extend the model’s capabilities by incorporating seal recognition and text spotting tasks, while remaining a 0.9B ultra-compact VLM with high efficiency.

Key Capabilities of PaddleOCR-VL-1.5

  1. With a parameter size of 0.9B, PaddleOCR-VL-1.5 achieves 94.5% accuracy on OmniDocBench v1.5, surpassing the previous SOTA model PaddleOCR-VL. Significant improvements are observed in table, formula, and text recognition.
  1. It introduces an innovative approach to document parsing by supporting irregular-shaped localization, enabling accurate polygonal detection under skewed and warped document conditions. Evaluations across five real-world scenarios—scanning, skew, warping, screen-photography, and illumination—demonstrate superior performance over mainstream open-source and proprietary models.
  1. The model introduces text spotting (text-line localization and recognition), along with seal recognition, with all corresponding metrics setting new SOTA results in their respective tasks.
  1. PaddleOCR-VL-1.5 further strengthens its capability in specialized scenarios and multilingual recognition. Recognition performance is improved for rare characters, ancient texts, multilingual tables, underlines, and checkboxes, and language coverage is extended to include China's Tibetan script and Bengali.
  1. The model supports automatic cross-page table merging and cross-page paragraph heading recognition, effectively mitigating content fragmentation issues in long-document parsing.

Model Architecture

<div align="center"> <img src="https://raw.githubusercontent.com/cuicheng01/PaddleXdocimages/refs/heads/main/images/paddleocrvl1_5/PaddleOCR-VL-1.5.png" width="800"/> </div>

News

  • ``2026.03.06`` 🚀 Support llama.cpp inference for the VLM component in PaddleOCR-VL-1.5. Click here for details.
  • ``2026.01.29`` 🚀 We release PaddleOCR-VL-1.5, —a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing.

Usage

Install Dependencies

Install PaddlePaddle and PaddleOCR:

bash
# The following command installs the PaddlePaddle version for CUDA 12.6. For other CUDA versions and the CPU version, please refer to https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ python -m pip install -U "paddleocr[doc-parser]>=3.4.0"

Basic Usage

CLI usage:

bash
paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png --pipeline_version v1.5

Python API usage:

python
from paddleocr import PaddleOCRVL pipeline = PaddleOCRVL(pipeline_version="v1.5") output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png") for res in output: res.print() res.save_to_json(save_path="output") res.save_to_markdown(save_path="output")

Accelerate VLM Inference via Optimized Inference Servers

  1. Start the VLM inference server:

You can start the vLLM inference service using one of two methods: - Method 1: PaddleOCR method

``bash docker run \ --rm \ --gpus all \ --network host \ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-nvidia-gpu \ paddleocr genai_server --model_name PaddleOCR-VL-1.5-0.9B --host 0.0.0.0 --port 8080 --backend vllm `` - Method 2: vLLM method

Card content reproduced from huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5 under the upstream license. Rendering trims fenced HTML, raw widgets and tables for safety; tap the link for the untouched original.
§ 02 · Benchmarks

Every benchmark PaddleOCR-VL-1.5 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01olmOCR-BenchComputer Vision · Document Parsingpass-rate79.1%#9/21source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where PaddleOCR-VL-1.5 actually performs.

Computer Vision
1
benchmark
avg rank #9.0
§ 06 · Sources & freshness

Where these numbers come from.

paper
1
result
0 of 1 rows marked verified.