Who leads the IAM benchmark?

GPT-4o mini currently leads IAM with a score of 3.34 on wer (lower is better).

What is the state-of-the-art score on IAM?

The state-of-the-art result on IAM is 3.34 (wer), achieved by GPT-4o mini as of 2026.

How many models are tracked on IAM?

Codesota tracks 25 models on IAM across 2 metrics.

When was the IAM leaderboard last updated?

The IAM leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2018.

Codesota · Benchmark · IAMHome/Leaderboards/Vision & Documents/Handwriting Recognition/IAM

Unknown

IAM.

Name: IAM Benchmark Results
Creator: Unknown
Published: 2018-01-01
License: https://creativecommons.org/licenses/by/4.0/

13,353 handwritten text lines from 657 writers. Standard handwriting benchmark.

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

wer

Wer is the reported evaluation metric for IAM. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for werverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	GPT-4o mini GPT-4o-mini WER on IAM line-level. March 2025. From: Benchmarking Large Language Models for Handwritten Text Recognition (arxiv 2503.15195).	verified	3.34	2026	Source ↗	Looks wrong?
02	HTR-JAND HTR-JAND (+LBC). Dec 2024. Table VI IAM WER. Includes Lexicon-Based Correction post-processing.	verified	3.78	2024	Source ↗	Looks wrong?
03	DRetHTR-base DRetHTR-base. Feb 2026. Table 9: IAM Aachen split (IAM-A) WER.	verified	6.55	2026	Source ↗	Looks wrong?
04	MetaWriter MetaWriter. CVPR 2025. IAM line-level WER.	verified	10.32	2025	Source ↗	Looks wrong?
05	HTR-ConvText HTR-ConvText. Dec 2024. Table 2 IAM test set.	verified	12.9	2024	Source ↗	Looks wrong?
06	HTR-VT(line-level) From paper: HTR-VT: Handwritten Text Recognition with Vision Transformer	verified	14.9	2024	Paper ↗Code ↗	Looks wrong?
07	HTR-VT HTR-VT. Li et al. 2024. Table 4 IAM test set.	verified	14.9	2024	Source ↗	Looks wrong?
08	Leaky LP Cell From paper: No Padding Please: Efficient Neural Handwriting Recognition	verified	15.9	2019	Paper ↗Code ↗	Looks wrong?
09	VAN Vertical Attention Network (VAN). WER from comparison tables in HTR-VT (2409.08573) and HTR-ConvText (2512.05021). IAM line-level.	verified	16.3	2022	Source ↗	Looks wrong?
10	Decouple Attention Network From paper: Decoupled Attention Network for Text Recognition	verified	19.6	2019	Paper ↗Code ↗	Looks wrong?
11	Start, Follow, Read From paper: Start, Follow, Read: End-to-End Full-Page Handwriting Recognition	verified	23.2	2018	Paper ↗Code ↗	Looks wrong?

cer

Cer is the reported evaluation metric for IAM. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for cerverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	HTR-JAND HTR-JAND with Lexicon-Based Correction (LBC) post-processing. Dec 2024. Joint Attention Network + Knowledge Distillation + curriculum learning. Table VI IAM comparison. Note: without LBC the model reaches ~2.34% CER (Table V ablation). IAM split not explicitly stated.	verified	1.23	2024	Source ↗	Looks wrong?
02	GPT-4o mini GPT-4o-mini evaluated zero-shot on IAM line-level handwriting. March 2025. Outperforms Transkribus supermodel. From: Benchmarking Large Language Models for Handwritten Text Recognition (arxiv 2503.15195).	verified	1.71	2026	Source ↗	Looks wrong?
03	DRetHTR-base DRetHTR-base: Decoder-only Retentive Network for HTR. Feb 2026. Table 9/11: IAM Aachen split (IAM-A), line-level. 1.6-1.9x faster inference and 38-42% less memory than Transformer baseline at same accuracy.	verified	2.26	2026	Source ↗	Looks wrong?
04	DTrOCR 105M From paper: DTrOCR: Decoder-only Transformer for Optical Character Recognition	verified	2.38	2023	Paper ↗Code ↗	Looks wrong?
05	Self-Attention + CTC + language model From paper: Rethinking Text Line Recognition Models	verified	2.75	2021	Paper ↗	Looks wrong?
06	TrOCR-large TrOCR-large (BEiT-large + RoBERTa-large). Microsoft. Table 4 in Li et al. 2023. IAM line-level test split. SOTA at publication.	verified	2.89	2023	Source ↗	Looks wrong?
07	TrOCR-large 558M From paper: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models	verified	2.89	2021	Paper ↗Code ↗	Looks wrong?
08	Transformer + CNN From paper: Rethinking Text Line Recognition Models	verified	2.96	2021	Paper ↗	Looks wrong?
09	MetaWriter MetaWriter: Personalized HTR via meta-learned prompt tuning. CVPR 2025. Table in paper: IAM line-level standard partition. Writer-adaptive; updates <1% of parameters at test time.	verified	3.36	2025	Source ↗	Looks wrong?
10	TrOCR-base TrOCR-base (BEiT-base + RoBERTa-base). Microsoft. Table 4 in Li et al. 2023. IAM line-level test split.	verified	3.42	2023	Source ↗	Looks wrong?
11	TrOCR-base 334M From paper: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models	verified	3.42	2021	Paper ↗Code ↗	Looks wrong?
12	HTR-ConvText HTR-ConvText: CNN+Transformer hybrid (ConvText block), 65.9M params, no pre-training. DAIR-Group, Dec 2024. Table 2: IAM line-level test set (6482/976/2915 split). Best among no-pretraining methods at publication.	verified	4.00	2024	Source ↗	Looks wrong?
13	TrOCR-small 62M From paper: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models	verified	4.22	2021	Paper ↗Code ↗	Looks wrong?
14	Transformer w/ CNN (+synth) From paper: Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition	verified	4.67	2020	Paper ↗	Looks wrong?
15	HTR-VT HTR-VT (Vision Transformer for HTR, no pre-training or synthetic data). Li et al. 2024. Table 4 IAM test set.	verified	4.70	2024	Source ↗	Looks wrong?
16	HTR-VT(line-level) From paper: HTR-VT: Handwritten Text Recognition with Vision Transformer	verified	4.70	2024	Paper ↗Code ↗	Looks wrong?
17	VAN Vertical Attention Network (VAN). Coquenet et al. IEEE TPAMI 2022. IAM line-level CER from comparison tables in HTR-VT (2409.08573) and HTR-ConvText (2512.05021).	verified	5.00	2022	Source ↗	Looks wrong?
18	Easter2.0 From paper: Easter2.0: Improving convolutional models for handwritten text recognition	verified	6.21	2022	Paper ↗Code ↗	Looks wrong?
19	FPHR+Aug Paragraph Level (~145 dpi) From paper: Full Page Handwriting Recognition via Image to Sequence Extraction	verified	6.30	2021	Paper ↗Code ↗	Looks wrong?
20	Start, Follow, Read From paper: Start, Follow, Read: End-to-End Full-Page Handwriting Recognition	verified	6.40	2018	Paper ↗Code ↗	Looks wrong?
21	Decouple Attention Network From paper: Decoupled Attention Network for Text Recognition	verified	6.40	2019	Paper ↗Code ↗	Looks wrong?
22	FPHR+Aug Line Level (~145 dpi) From paper: Full Page Handwriting Recognition via Image to Sequence Extraction	verified	6.50	2021	Paper ↗Code ↗	Looks wrong?
23	Leaky LP Cell From paper: No Padding Please: Efficient Neural Handwriting Recognition	verified	6.60	2019	Paper ↗Code ↗	Looks wrong?
24	FPHR Paragraph Level (~145 dpi) From paper: Full Page Handwriting Recognition via Image to Sequence Extraction	verified	6.70	2021	Paper ↗Code ↗	Looks wrong?
25	Transformer w/ CNN From paper: Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition	verified	7.62	2020	Paper ↗	Looks wrong?

Lineage

IAM in context.

See full ocr benchmarks lineage →

None — this is where the lineage begins.

This benchmark (1)

active2002-09

IAM

Successors (2)

saturated2015-08

ICDAR 2015

From clean handwritten text to incidental scene text — same 'read the pixels' task, fundamentally different visual domain. Spawned the decade of detection-then-recognition pipelines.

saturated2019-05

FUNSD

From transcription to structure: FUNSD reframed OCR as 'find the question, link to its answer' rather than 'recognise every character'. The shift that produced LayoutLM and the entire form-understanding line.

§ 04 · Submit a result

Add to the leaderboard.

← Back to Handwriting Recognition