Who leads the FUNSD benchmark?

LayoutLMv3-large currently leads FUNSD with a score of 92.08 on f1.

What is the state-of-the-art score on FUNSD?

The state-of-the-art result on FUNSD is 92.08 (f1), achieved by LayoutLMv3-large as of 2023.

How many models are tracked on FUNSD?

Codesota tracks 13 models on FUNSD.

When was the FUNSD leaderboard last updated?

The FUNSD leaderboard on Codesota includes results through 2023, with the earliest tracked result from 2020.

Codesota · Benchmark · FUNSDHome/Leaderboards/FUNSD

Unknown

FUNSD.

Name: FUNSD Benchmark Results
Creator: Unknown
Published: 2020-01-01
License: https://creativecommons.org/licenses/by/4.0/

199 fully annotated forms. Tests semantic entity labeling and linking.

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

f1

F1 is the reported evaluation metric for FUNSD. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for f1verifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	LayoutLMv3-large LayoutLMv3-large. Table 1 in paper. ACM MM 2022. SOTA at time of publication.	verified	92.08	2022	Source ↗	Looks wrong?
02	UDOP UDOP (Unified Document Processing). Table 3 in paper. CVPR 2023. Single generative model for all document tasks.	verified	91.62	2023	Source ↗	Looks wrong?
03	LayoutLMv3-base LayoutLMv3-base. Table 1 in paper. ACM MM 2022.	verified	90.29	2022	Source ↗	Looks wrong?
04	DocFormerv2-large DocFormerv2-large. Table 5 in paper. ICCV 2023.	verified	88.89	2023	Source ↗	Looks wrong?
05	LiLT[EN-R2]-base LiLT with English RoBERTa backbone (EN-R2), base size. Table 2 in paper. ACL 2022. Best monolingual FUNSD result.	verified	88.41	2022	Source ↗	Looks wrong?
06	DocFormerv2-base DocFormerv2-base. Table 5 in paper. ICCV 2023.	verified	88.37	2023	Source ↗	Looks wrong?
07	StructuralLM StructuralLM (large). Table 1 in paper. ACL 2021. Precision 83.52, Recall 86.81.	verified	85.14	2021	Source ↗	Looks wrong?
08	FormNet FormNet. Table 1 in paper. ACL 2022. Uses rich structural encoding via graph neural network.	verified	84.69	2022	Source ↗	Looks wrong?
09	BROS-large BROS-large on FUNSD entity extraction. Table 3 in paper. AAAI 2022.	verified	84.52	2022	Source ↗	Looks wrong?
10	LayoutLMv2-large LayoutLMv2-large. Table 6 in paper. ACL 2021.	verified	84.2	2021	Source ↗	Looks wrong?
11	LayoutLMv2-base LayoutLMv2-base. Table 6 in paper. ACL 2021.	verified	82.76	2021	Source ↗	Looks wrong?
12	LayoutLMv1-base LayoutLM-base with text+layout+image embeddings, 11M docs. Best base variant. Table 1 in paper. ACL 2020.	verified	79.27	2020	Source ↗	Looks wrong?
13	LayoutLMv1-large LayoutLM-large, text+layout, MVLM, 11M docs 1 epoch. Table 1 in paper. ACL 2020.	verified	77.89	2020	Source ↗	Looks wrong?

Lineage

FUNSD in context.

See full ocr benchmarks lineage →

Predecessors (1)

active2002-09

IAM

From transcription to structure: FUNSD reframed OCR as 'find the question, link to its answer' rather than 'recognise every character'. The shift that produced LayoutLM and the entire form-understanding line.

This benchmark (1)

saturated2019-05

FUNSD

Successors (1)

superseded2023-05

OCRBench

Once VLMs could read at all, evaluation needed to span more than forms. OCRBench bundled scene text, document VQA, KIE and handwritten math into one composite — the first VLM-era OCR benchmark.

§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards