Tesseract OCR example output
A real Tesseract output sample with boxes and OCR errors.
Tesseract is fast, local, and battle-tested. This page shows its raw invoice OCR output, the bounding boxes it emits, and the exact recognition mistakes to watch for.

Tesseract bounding boxes
Word-level boxes are useful for audits, but table rows and slash-heavy labels can still produce recognition errors.
Word boxes
47
Tesseract emits many word-level regions.
Avg confidence
0.911
Solid, but lower on table and punctuation fields.
Output lines
14
Plain text is immediately available.
Best use
Local
Fast CPU OCR for clean printed text.
Raw Tesseract output
INVOICE Invoice #: INV-2025-001 Date: December 16, 2025 Due Date: January 15, 2026 Bill To: Acme Corporation 123 Business Ave New York, NY 10001 Description ay Price Total Web Development 40 $125.00 $5,000.00 UWUX Design 20 $100.00 $2,000.00 Subtotal: $7,000.00 Tax (8%) $560.00 Total: $7,560.00
Tesseract OCR code example
Use `image_to_string` for quick text extraction and `image_to_data` when you need word boxes and confidence scores. The visual overlay is the fastest way to debug missed or merged regions.
import pytesseract
from PIL import Image
image = Image.open("sample_invoice.png")
text = pytesseract.image_to_string(image)
boxes = pytesseract.image_to_data(
image,
output_type=pytesseract.Output.DICT,
)
print(text)
print(boxes["text"], boxes["conf"])Error analysis
Common mistakes visible in the output
Search traffic for OCR examples is often image-led. The important thing is showing the output and the visual reason behind each mistake, not only describing the model.
Qty
ay
The table header is misread, which can break invoice line-item parsing.
UI/UX Design
UWUX Design
Slash-heavy design text is a common OCR edge case.
Tax (8%):
Tax (8%)
Punctuation can disappear even when the number itself is preserved.
Layout
word boxes
Word-level output is useful, but downstream layout assembly is still required.

Input document
The same invoice image is used across examples so OCR engines can be compared visually.

PaddleOCR vs Tesseract boxes
The comparison view shows why layout quality matters before extracting structured fields.