Tesseract OCR example output

A real Tesseract output sample with boxes and OCR errors.

Tesseract is fast, local, and battle-tested. This page shows its raw invoice OCR output, the bounding boxes it emits, and the exact recognition mistakes to watch for.

Tesseract OCR example output with bounding boxes on an invoice

Tesseract bounding boxes

Word-level boxes are useful for audits, but table rows and slash-heavy labels can still produce recognition errors.

Word boxes

47

Tesseract emits many word-level regions.

Avg confidence

0.911

Solid, but lower on table and punctuation fields.

Output lines

14

Plain text is immediately available.

Best use

Local

Fast CPU OCR for clean printed text.

Raw Tesseract output

INVOICE
Invoice #: INV-2025-001
Date: December 16, 2025
Due Date: January 15, 2026
Bill To:
Acme Corporation
123 Business Ave
New York, NY 10001
Description ay Price Total
Web Development 40 $125.00 $5,000.00
UWUX Design 20 $100.00 $2,000.00
Subtotal: $7,000.00
Tax (8%) $560.00
Total: $7,560.00

Tesseract OCR code example

Use `image_to_string` for quick text extraction and `image_to_data` when you need word boxes and confidence scores. The visual overlay is the fastest way to debug missed or merged regions.

import pytesseract
from PIL import Image

image = Image.open("sample_invoice.png")

text = pytesseract.image_to_string(image)
boxes = pytesseract.image_to_data(
    image,
    output_type=pytesseract.Output.DICT,
)

print(text)
print(boxes["text"], boxes["conf"])

Error analysis

Common mistakes visible in the output

Search traffic for OCR examples is often image-led. The important thing is showing the output and the visual reason behind each mistake, not only describing the model.

Qty

ay

The table header is misread, which can break invoice line-item parsing.

UI/UX Design

UWUX Design

Slash-heavy design text is a common OCR edge case.

Tax (8%):

Tax (8%)

Punctuation can disappear even when the number itself is preserved.

Layout

word boxes

Word-level output is useful, but downstream layout assembly is still required.

Sample invoice image used for Tesseract OCR example output

Input document

The same invoice image is used across examples so OCR engines can be compared visually.

PaddleOCR and Tesseract OCR bounding box comparison

PaddleOCR vs Tesseract boxes

The comparison view shows why layout quality matters before extracting structured fields.

Related visual research pages