Interactive OCR Correction
Solving the "impeccable OCR" myth with a user-in-the-loop mobile interface.
The Problem: The Flicker
There really isn't an "impeccable" OCR solution, especially on mobile video feeds.
A classic example: The letter H, when viewed at a slightly skewed angle, will often "flicker" into an N and back again in the OCR results. This creates a jittery, unreliable data stream that frustrates users.
The Solution: Interactive Anchor UI
Instead of trying to force the OCR model to be perfect, we accept the uncertainty and build a UI that empowers the user to resolve it quickly. This solution uses Google MLKit to scan and correlate text regions, spawning interactive dropdowns directly on the camera feed.
1. Correlating Flicker
The system tracks text regions across frames. When it detects a region "flickering" between multiple values (e.g., "HELLO" vs "NELLO"), it aggregates these candidates instead of just showing the latest one.
2. User Selection
A dropdown or selection UI is spawned directly on top of the text in the photo. The user can tap to confirm the correct value ("H" or "N") from the detected candidates.
Handling Camera Drift
The biggest challenge in overlaying UI on a live camera feed is movement. If the user moves their hand slightly, the UI must "stick" to the real-world object.
The Centroid Anchor Technique
To achieve resiliency to camera drift, the system uses the middle of the text area as an anchor point.
- Step 1: Calculate the centroid (center point) of the detected text bounding box in the current frame.
- Step 2: In the next frame, search for text regions near that previous centroid.
- Step 3: "Anchor" the UI element to this calculated center, rather than the volatile edges of the bounding box.
This "anchoring into itself" technique allows the overlay to follow the text smoothly even as the camera drifts or the bounding box shape fluctuates slightly.
Implementation Notes (Google MLKit)
Google MLKit on Android/iOS provides the raw text blocks with bounding boxes. The custom logic sits on top:
// Conceptual Logic
onFrame(image) {
results = MLKit.detectText(image);
for (block in results) {
// Calculate stable center
center = getCentroid(block.frame);
// Find matching active tracker
tracker = findTrackerNear(center);
if (tracker) {
tracker.update(center, block.text);
// If text changed, add to candidate list
if (tracker.text != block.text) {
tracker.addCandidate(block.text);
}
} else {
createTracker(center, block.text);
}
}
// Update UI overlays based on trackers
drawOverlays(trackers);
} By maintaining a history of values for each tracked region, the system turns the "bug" of flickering into a "feature" – a list of possible options for the user.
Key Takeaways
- 1. Accept Imperfection: Mobile OCR will flicker. Design for it, don't just fight it.
- 2. User-in-the-Loop: When confidence varies, ask the user. Spawning a dropdown on the object is intuitive.
- 3. Centroid Anchoring: Use the center of mass for tracking text regions to create a stable UI in a handheld AR experience.