§ 06 · History

One hundred and fifty years of teaching machines to read.

From a selenium photocell concept to vision-language models that read better than their operators. Every breakthrough that led to today’s document-understanding OCR systems.

Era I · 1870 — 1970

Mechanical.

The idea of a machine that could read predates computers by almost a century. Early pioneers built physical devices from selenium cells, spinning disks and vacuum tubes — driven mostly by the ambition of giving blind readers access to printed text.

1870
Carey’s retina-inspired sensor
T.D. Carey proposes a mosaic of selenium photocells that converts an image into electrical signals. Decades ahead of the available technology.
RETINA MOSAIC TO SIGNAL GRIDlight intensitymachinevision
1885
Nipkow’s scanning disk
A rotating disk with spiral holes scans an image point-by-point into a serial electrical signal. The scanning principle persists in every OCR device for 60 years.
SEQUENTIAL SCANNING2D page becomes a time-series signal
1912
Optophone for the blind
d’Albe’s device maps each printed character to a distinct musical chord. A trained reader reaches about one word per minute.
TEXT AS SOUNDAletterchord
1914
Goldberg’s statistical machine
The first device to recognise printed characters by comparing their photocell signature against stored templates. Ancestor of all template-based OCR.
LIGHT PATTERN AGAINST STORED TEMPLATESRPQRSmaximum overlap = R
1929
Tauschek’s template patent
A spinning disk with cut-out letters; maximum light transmission identifies the character. Elegant and impossibly slow.
ROTATING TEMPLATE DISKBABCDEFGHIJKLmatch
1931
IBM acquires Goldberg’s patents
The technology sits dormant for twenty years, waiting for electronics to catch up.
1949
RCA reading machine
The US Veterans Administration funds the first prototype that reads printed pages aloud. Accuracy under 50% — but OCR now has serious government funding.
1951
GISMO — first electronic OCR
NIST’s Sheppard replaces the spinning disk with static photocell arrays. The leap from mechanical to electronic is the most important transition in OCR history.
NO MOVING PARTSparallelcomparisoncircuitfasterOCR
1955
MICR for banking
The American Bankers Association adopts the E-13B magnetic-ink font for check processing. Not optical — but it proves banks will pay for machine reading.
BANKING MAKES MACHINE READING COMMERCIAL0123456789 987654321magnetic waveform
1957
The perceptron detour
Rosenblatt’s Mark I Perceptron barely distinguishes triangles from squares. Minsky’s 1969 critique kills neural networks for two decades.
EARLY LEARNED WEIGHTSphotocellstriangleconfidence 0.67
1965
First commercial OCR
Reader’s Digest + RCA process 1,500 documents per hour — but only in the purpose-built OCR-A font.
1966
US Postal Service
Machine-sorting mail using OCR scanners. The first industrial-scale deployment.
1968
Kurzweil’s insight
Extract structural features (strokes, apexes, crossbars), then classify. The separation of feature extraction from classification is the architecture every modern OCR system still uses.
FEATURES BEAT FONT TEMPLATESAAAextractapex + strokes+ crossbarA
Era II · 1974 — 2006

Desktop.

The personal-computer revolution turned OCR from a million-dollar mainframe operation into desktop software. Neural networks arrived quietly, and one Bell Labs researcher changed everything with a 28×28 pixel grid.

1974
Kurzweil Computer Products
The first system to read any typeface. Stevie Wonder is an early customer.
1974
OCR-B — Frutiger
A machine-readable font that is also legible to humans. Still on every machine-readable passport in the world.
MACHINE FONT THAT HUMANS CAN READOCR-AAB12max separation, low eleganceOCR-BAB12legible for passports and people
1976
Kurzweil reading machine
Flatbed CCD scanner + omni-font OCR + text-to-speech. $50,000. Xerox acquires the company in 1980.
SCAN TO SPEECH PIPELINECCD scanneromnifont OCRtext to speech
1985
OCR goes desktop
Mac and Windows GUIs. HP Labs begins developing Tesseract internally. Scanners drop below $1,000.
1990
LeCun’s LeNet
A CNN recognises handwritten digits at 99.2% accuracy on MNIST. The first network that learns features from data. Still the architectural ancestor of every modern recogniser.
1995
The “solved problem” illusion
OCR accuracy hits 99% on clean printed text. Industry declares the problem solved. Handwriting, receipts, faded prints remain nearly impossible.
1998
Google begins book scanning
Research that will become Google Books. Eventually 40M+ books digitised — the largest OCR deployment in history.
2000
ABBYY FineReader
Worldwide enterprise standard for document digitisation. For fifteen years, “OCR” in enterprise effectively means ABBYY.
2005
Tesseract open-sourced
HP releases the engine. Google sponsors development. High-quality OCR is free, and still the most-used OSS OCR tool today.
Era III · 2012 — 2022

Deep learning.

AlexNet’s 2012 ImageNet victory ignited the deep-learning revolution. Within five years every OCR pipeline was rebuilt with neural networks. Text recognition shifted from “recognise characters” to “understand documents”.

2012
AlexNet
Deep CNNs learn features humans cannot hand-engineer. Every CV problem, OCR included, is ripe for rethinking.
2013
CRNN — the ten-year king
Convolutional-recurrent network: CNN sees the image, RNN reads it left-to-right like a human. Dominates OCR for nearly a decade.
2015
reCAPTCHA v2
Every “select all traffic lights” trained Google’s Street-View OCR for free. Billions of annotations — the most profitable UX pattern ever designed.
2017
EAST and CRAFT
Scene-text detection at 13 FPS on a single GPU. Localises text so recognition networks can focus on reading it.
2018
Attention replaces CTC
Decoders look back at the image while predicting each character. “rn” versus “m” becomes solvable.
2019
PaddleOCR released
Baidu’s Apache 2.0 toolkit becomes the default for self-hosted OCR and the foundation of the VLM-OCR era.
2020
Transformers enter OCR
TrOCR and Donut prove pure transformer architectures can match or beat CRNN. Donut processes document images end-to-end with no traditional OCR module at all.
Era IV · 2023 — 2026

Vision-language models.

The most disruptive shift in OCR history. Models built to understand imagesturn out to be better at reading text than models built specifically for OCR. The commercial OCR industry is blindsided.

2023
GPT-4V “accidentally” wins
Nobody optimised it for OCR, yet it immediately outperforms every dedicated OCR system on complex documents. “Understanding” turns out to be a superset of “reading”.
2024
Economics shift
Purpose-built doc-AI tools arrive: Mistral OCR, Docling, olmOCR and model-specific parsers. The cost of high-quality OCR starts depending on benchmark fit, GPU utilization and operations, not just vendor list price.
2025
Open weights become serious
PaddleOCR-VL-class systems make self-hosted document parsing credible. The right comparison is no longer open source versus paid API; it is output contract, verification tier, cost model and deployment risk.
2026
VLMs change the OCR contract
SOTA shifts from plain character recognition to document understanding. Traditional detect-recognise-post-process pipelines still matter for constrained edge and batch text, but they no longer define complex document SOTA.
OCR COLLAPSES INTO DOCUMENT UNDERSTANDINGdetectrecogniseorderparseone VLM pass