Home/OCR/Document Scanner
Tutorial

Build a Document Scanner

Detect document edges, correct perspective, and enhance scanned images. Interactive demo below.

How It Works

Document scanning involves four steps:

  1. Edge detection: Find where the document boundaries are using Canny edge detection
  2. Contour finding: Extract the document outline as a 4-point polygon
  3. Perspective transform: Warp the tilted document into a flat rectangle
  4. Enhancement: Improve contrast and optionally convert to black-and-white

Step 1: Edge Detection

The Canny algorithm finds edges by looking for rapid changes in pixel intensity. We first convert to grayscale and blur to reduce noise:

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edges = cv2.Canny(blurred, 50, 150)

Complete Python Code

import cv2
import numpy as np

def scan_document(image_path: str, output_path: str) -> None:
    """
    Scan a document: detect edges, correct perspective, enhance.
    """
    # Load image
    img = cv2.imread(image_path)
    orig = img.copy()

    # Resize for processing (keep aspect ratio)
    height, width = img.shape[:2]
    scale = 500 / max(height, width)
    img = cv2.resize(img, None, fx=scale, fy=scale)

    # Convert to grayscale and blur
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)

    # Edge detection
    edges = cv2.Canny(blurred, 50, 150)
    edges = cv2.dilate(edges, np.ones((3, 3), np.uint8))

    # Find contours
    contours, _ = cv2.findContours(edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted(contours, key=cv2.contourArea, reverse=True)

    # Find the document contour (largest 4-sided polygon)
    doc_contour = None
    for contour in contours:
        peri = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.02 * peri, True)

        if len(approx) == 4:
            doc_contour = approx
            break

    if doc_contour is None:
        raise ValueError("Could not detect document edges")

    # Scale contour back to original image size
    doc_contour = (doc_contour / scale).astype(np.float32)

    # Perspective transform and enhance...
    cv2.imwrite(output_path, scanned_enhanced)
    print(f"Saved: {output_path}")

Install: pip install opencv-python numpy

When Edge Detection Fails

Auto-detection fails when:

  • Document is on a similar-colored background (white paper on white desk)
  • Part of the document is cut off in the photo
  • Strong shadows or reflections break the edge
  • Multiple documents in the frame

For these cases, let users manually select the 4 corners (like in the demo above). Many apps show the auto-detected corners but allow adjustment before transforming.

Adding OCR

Once you have a clean scan, run OCR to extract text. See Getting Started with OCR for how to use PaddleOCR or GPT-4o on your scanned documents.