Build a Document Scanner
Detect document edges, correct perspective, and enhance scanned images. Interactive demo below.
How It Works
Document scanning involves four steps:
- Edge detection: Find where the document boundaries are using Canny edge detection
- Contour finding: Extract the document outline as a 4-point polygon
- Perspective transform: Warp the tilted document into a flat rectangle
- Enhancement: Improve contrast and optionally convert to black-and-white
Step 1: Edge Detection
The Canny algorithm finds edges by looking for rapid changes in pixel intensity. We first convert to grayscale and blur to reduce noise:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edges = cv2.Canny(blurred, 50, 150)Complete Python Code
import cv2
import numpy as np
def scan_document(image_path: str, output_path: str) -> None:
"""
Scan a document: detect edges, correct perspective, enhance.
"""
# Load image
img = cv2.imread(image_path)
orig = img.copy()
# Resize for processing (keep aspect ratio)
height, width = img.shape[:2]
scale = 500 / max(height, width)
img = cv2.resize(img, None, fx=scale, fy=scale)
# Convert to grayscale and blur
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
# Edge detection
edges = cv2.Canny(blurred, 50, 150)
edges = cv2.dilate(edges, np.ones((3, 3), np.uint8))
# Find contours
contours, _ = cv2.findContours(edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)
# Find the document contour (largest 4-sided polygon)
doc_contour = None
for contour in contours:
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
if len(approx) == 4:
doc_contour = approx
break
if doc_contour is None:
raise ValueError("Could not detect document edges")
# Scale contour back to original image size
doc_contour = (doc_contour / scale).astype(np.float32)
# Perspective transform and enhance...
cv2.imwrite(output_path, scanned_enhanced)
print(f"Saved: {output_path}")Install: pip install opencv-python numpy
When Edge Detection Fails
Auto-detection fails when:
- Document is on a similar-colored background (white paper on white desk)
- Part of the document is cut off in the photo
- Strong shadows or reflections break the edge
- Multiple documents in the frame
For these cases, let users manually select the 4 corners (like in the demo above). Many apps show the auto-detected corners but allow adjustment before transforming.
Adding OCR
Once you have a clean scan, run OCR to extract text. See Getting Started with OCR for how to use PaddleOCR or GPT-4o on your scanned documents.