Why is my OCR not working on handwritten text?

Handwriting OCR fails due to high variability in letter shapes, connected cursive strokes, inconsistent baselines, and mixed print/cursive styles. VLM-based models like GPT-5.4 handle handwriting better than traditional OCR engines.

What DPI should I use for OCR?

Use 300 DPI minimum for printed text, 400+ DPI for small fonts or poor quality originals. Below 200 DPI, most OCR engines show significant accuracy degradation. Super-resolution upscaling can help recover low-resolution scans.

Why does OCR fail on tables?

Table OCR fails because traditional engines read left-to-right, ignoring cell boundaries. Merged cells, nested tables, and spanning headers compound the problem. Modern VLM models like PaddleOCR-VL and dots.ocr handle tables better by understanding document structure.

How do I improve OCR accuracy on skewed documents?

Apply deskewing preprocessing using Hough transform or projection profile analysis. OpenCV provides cv2.minAreaRect for angle detection. Skew angles above 5 degrees significantly degrade OCR accuracy.

Codesota · OCR · Failure ModesHome/OCR/Failure Modes

Unique Analysis · Not Found Elsewhere

OCR failure modes.

Systematic analysis of when and why OCR fails in production.

No marketing fluff. Real failure modes with code examples for mitigation.

· 7 failure categories

· Code examples included

· Model recommendations

Jump to failure mode

1. Handwriting 2. Low Resolution 3. Complex Tables 4. Mixed Language 5. Layout 6. Image Quality 7. Mitigation Model Selection

§ 01 · Handwriting

Handwriting edge cases.

When models struggle with cursive, mixed print/cursive, and individual writing styles.

What Fails

×Cursive connections: Letters blend together, making segmentation impossible
×Mixed print/cursive: Same word uses both styles (common in forms)
×Baseline drift: Text curves or tilts across the line
×Letter variations: Same person writes 'a' three different ways
×Crossed-out text: Corrections and strikethroughs read as content

Why It Fails

Traditional OCR relies on template matching against known character shapes. Handwriting has infinite variability - there's no template to match.

Character segmentation assumes clear boundaries between letters. Cursive writing is inherently connected, breaking this assumption.

Training data is biased toward printed text. Most OCR models see 100x more typed characters than handwritten ones during training.

Solutions

Model Selection

GPT-5.4: Best for mixed print/cursive, uses context to resolve ambiguity
Gemini 2.5 Pro: Strong on diverse handwriting styles
Avoid: Tesseract, PaddleOCR basic (not designed for handwriting)

Preprocessing

Apply binarization to increase contrast
Use deskewing to correct baseline drift
Increase resolution to 400+ DPI for small text
Consider word-level rather than character-level recognition

§ 02 · Resolution

Low resolution documents.

DPI thresholds, when upscaling helps, and when it makes things worse.

300+ DPI

Optimal for most OCR. No preprocessing needed.

200-300 DPI

Marginal. Small fonts may fail. Consider upscaling.

<200 DPI

High failure rate. Upscaling required.

What Fails

×Small fonts: 8pt and below become unreadable below 300 DPI
×Thin strokes: Fonts like Arial Narrow lose critical detail
×Similar characters: c/e, o/0, I/l/1 become indistinguishable
×Diacritics: Accent marks merge with base letters

Why It Fails

OCR models learn from high-resolution training data. At low resolution, features disappear - the subtle curves that distinguish 'c' from 'e' are lost.

Aliasing artifacts from poor sampling create false features that confuse recognition.

Neural networks are sensitive to input distribution shift. Low-res images are out-of-distribution for most models.

Solution: Intelligent Upscaling

Use Lanczos interpolation for upscaling. Avoid bicubic for text - it introduces blur. For severely degraded images, consider AI upscalers like Real-ESRGAN.

from PIL import Image
import cv2
import numpy as np

def upscale_for_ocr(image_path, target_dpi=300):
    """Upscale low-resolution images for better OCR."""
    img = cv2.imread(image_path)

    # Estimate current DPI (assuming standard scan)
    height, width = img.shape[:2]
    current_dpi = min(width, height) / 8.5  # Assume letter size

    if current_dpi < target_dpi:
        scale = target_dpi / current_dpi
        new_width = int(width * scale)
        new_height = int(height * scale)

        # Use INTER_LANCZOS4 for upscaling
        upscaled = cv2.resize(img, (new_width, new_height),
                              interpolation=cv2.INTER_LANCZOS4)
        return upscaled
    return img

Warning: Upscaling cannot recover information that was never captured. If the original scan was 72 DPI, upscaling to 300 DPI creates interpolated pixels, not real detail. It can help with some models but may introduce artifacts.

§ 03 · Tables

Complex table failures.

Merged cells, nested tables, spanning headers, and why linear reading breaks.

Failure Patterns

×Merged cells: OCR reads across the merge, mixing unrelated data
×Nested tables: Inner table structure is completely lost
×Spanning headers: Column associations break when headers span multiple columns
×Borderless tables: Without visual separators, alignment is guessed
×Multi-line cells: Line breaks within cells create phantom rows

Benchmark Reality

PaddleOCR-VL88.56 TEDS

dots.ocr 3B86.8 TEDS

Mistral OCR 370.9 TEDS

clearOCR0.8 TEDS

TEDS = Tree Edit Distance Score (higher is better). Measures structural accuracy of table extraction.

Solutions

Model Selection

Use VLM-based models with explicit table understanding:

PaddleOCR-VL: Best table TEDS, outputs HTML/Markdown
dots.ocr 3B: Compact model with strong table handling
Docling: Dedicated table extraction pipeline

Output Format Strategy

Request structured output to preserve table semantics:

HTML tables: Preserves colspan/rowspan
Markdown: Simpler but loses merged cells
JSON: Best for downstream processing
Avoid: Plain text extraction for tables

§ 04 · Languages

Mixed language confusion.

Code-switching, embedded formulas, and when the language model gets confused.

What Fails

×Code-switching: German text with English product names
×Embedded formulas: LaTeX or math notation within text
×Script mixing: Latin + Cyrillic + Greek in same document
×Transliteration: Names written in multiple scripts
×CJK + Latin: Japanese/Chinese with English terminology

Why It Fails

Most OCR models use language-specific decoders. When you select "German", it expects German vocabulary and grammar patterns.

Embedded English words get force-fit into German vocabulary, creating nonsense like "Softwerr" for "Software".

Math formulas use symbols that look like letters but have different meanings, causing semantic confusion.

Solutions

Model Selection

Gemini 2.5 Pro: Best multilingual handling, no language selection needed
Chandra OCR: 40+ languages, handles mixed content
GPT-5.4: Uses context to resolve language ambiguity

Strategies

Use auto-detect mode instead of forcing language
For formulas, use specialized extractors (LaTeX-OCR, Mathpix)
Post-process with spell-check that handles multiple dictionaries

§ 05 · Layout

Layout complexity.

Multi-column, footnotes, marginalia, sidebars, and reading order disasters.

What Fails

×Column bleed: Two columns read as alternating lines
×Footnotes: Mixed with main text, destroying flow
×Marginalia: Margin notes read as part of body text
×Floating boxes: Callouts and sidebars interrupt reading order
×Wrapped text: Text flowing around images gets fragmented

Reading Order Benchmark

dots.ocr 3B95.0%

Mistral OCR 391.6%

clearOCR86.0%

Reading order accuracy from OmniDocBench. Measures correct sequencing of document elements.

Solutions

Model Selection

GPT-5.4: Best at understanding reading order from layout context
PaddleOCR-VL: Explicit layout detection module
Docling: Academic paper specialist

Preprocessing

Use layout detection before OCR (YOLO, LayoutLM)
Extract regions separately then reassemble
For PDFs, try PDF parsing before OCR (PyMuPDF)

§ 06 · Image Quality

Image quality issues.

Shadows, folds, reflections, skew, blur, and physical document damage.

Shadows & Lighting

Page curl shadows near spine
Finger shadows from holding
Uneven lighting across page

Physical Damage

Creases and fold marks
Water damage / staining
Torn edges, punch holes

Capture Issues

Skewed/rotated capture
Motion blur
Reflections from glossy paper

Deskewing

Correct rotation using Hough transform or minimum area rectangle detection. Skew angles above 5 degrees significantly impact OCR accuracy.

import cv2
import numpy as np

def deskew_image(image_path):
    """Correct document skew using projection profile."""
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

    # Threshold and find contours
    _, binary = cv2.threshold(img, 0, 255,
                              cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

    # Find the minimum area rectangle
    coords = np.column_stack(np.where(binary > 0))
    angle = cv2.minAreaRect(coords)[-1]

    # Adjust angle
    if angle < -45:
        angle = 90 + angle

    # Rotate the image
    (h, w) = img.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(img, M, (w, h),
                             flags=cv2.INTER_CUBIC,
                             borderMode=cv2.BORDER_REPLICATE)
    return rotated

Shadow Removal

Remove uneven lighting and shadows using background subtraction. Particularly important for book scans and camera captures.

import cv2
import numpy as np

def remove_shadows(image_path):
    """Remove shadows from document images."""
    img = cv2.imread(image_path)
    rgb_planes = cv2.split(img)

    result_planes = []
    for plane in rgb_planes:
        # Dilate to get background
        dilated = cv2.dilate(plane, np.ones((7, 7), np.uint8))
        bg = cv2.medianBlur(dilated, 21)

        # Subtract background and normalize
        diff = 255 - cv2.absdiff(plane, bg)
        norm = cv2.normalize(diff, None, 0, 255, cv2.NORM_MINMAX)
        result_planes.append(norm)

    result = cv2.merge(result_planes)
    return result

Full Preprocessing Pipeline

Combine denoising, contrast enhancement, and binarization for optimal OCR input.

import cv2
import numpy as np

def preprocess_for_ocr(image_path):
    """Full preprocessing pipeline for OCR."""
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

    # 1. Denoise
    denoised = cv2.fastNlMeansDenoising(img, h=10)

    # 2. Increase contrast (CLAHE)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    enhanced = clahe.apply(denoised)

    # 3. Binarization (adaptive threshold)
    binary = cv2.adaptiveThreshold(
        enhanced, 255,
        cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY, 11, 2
    )

    # 4. Remove small noise (morphological opening)
    kernel = np.ones((2, 2), np.uint8)
    cleaned = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)

    return cleaned

§ 07 · Mitigation

Mitigation strategies.

Systematic approaches to improving OCR accuracy across all failure modes.

Preprocessing Pipeline

1Deskew: Correct rotation before anything else
2Shadow removal: Normalize lighting across image
3Upscale: If below 300 DPI, upscale to target
4Denoise: Remove scanner noise and artifacts
5Binarize: Optional, helps some engines

Post-Processing

−Spell check: Language-aware correction (Hunspell)
−Format validation: Regex for dates, numbers, IDs
−Confidence filtering: Flag low-confidence regions for review
−Multi-engine voting: Run 2-3 engines, take consensus
−LLM correction: Use GPT/Claude to fix obvious errors

Quality Assurance

−Sample-based auditing: Manually verify 1-5% of output
−Known-document testing: Include test documents with ground truth
−Confidence thresholds: Route low-confidence to human review
−Error tracking: Log and categorize failures for improvement

When to Accept Failure

×Severely damaged documents: 40%+ text obscured
×Extremely low resolution: Sub-100 DPI, no upscaling helps
×Artistic/decorative fonts: Designed to be hard to read
×Cost exceeds value: Manual entry cheaper than fixing OCR

§ 08 · Model Selection

Model selection by failure mode.

Choose your model based on your primary failure mode, not just overall accuracy.

Failure Mode	Best Choice	Alternative	Avoid
Handwriting	GPT-5.4	Gemini 2.5 Pro	Tesseract, PaddleOCR basic
Low Resolution	Chandra OCR	GPT-5.4 with preprocessing	Any without upscaling
Complex Tables	PaddleOCR-VL, dots.ocr	Mistral OCR 3	clearOCR, Tesseract
Mixed Languages	Gemini 2.5 Pro	Chandra OCR	Single-language engines
Complex Layout	GPT-5.4, dots.ocr	PaddleOCR-VL	Traditional OCR
Poor Image Quality	Chandra OCR	GPT-5.4	Engines without preprocessing

Need Help with Specific Failures?

We offer private evaluations on your actual documents. Find out which models fail on your specific document types.

OCR Decision Guide Back to OCR Benchmarks

About This Analysis

This failure mode analysis is based on our internal testing across thousands of documents, combined with publicly available benchmark data from OmniDocBench and OCRBench v2. Model recommendations are based on demonstrated performance, not vendor claims.

Methodology Full Benchmarks Raw Results