Codesota · Computer Vision · Document Image Classification · rvl-cdipTasks/Computer Vision/Document Image Classification
Document Image Classification · benchmark dataset · 2020 · EN

rvl-cdip.

Dataset from Papers With Code

Saturated benchmark

Benchmark near ceiling or stagnant — no meaningful SOTA movement in 2+ years

Submit a result
§ 01 · Leaderboard

Best published scores.

37 results indexed across 3 metrics. Shaded row marks current SOTA; ties broken by submission date.


Primary
accuracy · higher is better
All metrics
accuracy, far, war
accuracy· primary
35 rows
#ModelOrgSubmittedPaper / codeaccuracy
01EAMLMay 2023EAML: Ensemble Self-Attention-based Mutual Learning Netw…97.70
02Cross-ModalJun 2020papers-with-code97.05
03DocFormerBASEJun 2021DocFormer: End-to-End Transformer for Document Understan… · code96.17
04LayoutLMV3LargeApr 2022LayoutLMv3: Pre-training for Document AI with Unified Te… · code95.93
05LiLT[EN-R]BASEFeb 2022LiLT: A Simple yet Effective Language-Independent Layout… · code95.68
06LayoutLMv2LARGEDec 2020LayoutLMv2: Multi-modal Pre-training for Visually-Rich D… · code95.64
07TILT-LargeFeb 2021Going Full-TILT Boogie on Document Understanding with Te… · code95.52
08DocFormer largeJun 2021DocFormer: End-to-End Transformer for Document Understan… · code95.50
09LayoutLMv3BASEApr 2022LayoutLMv3: Pre-training for Document AI with Unified Te… · code95.44
10DonutNov 2021OCR-free Document Understanding Transformer · code95.30
11LayoutLMv2BASEDec 2020LayoutLMv2: Multi-modal Pre-training for Visually-Rich D… · code95.25
12TILT-BaseFeb 2021Going Full-TILT Boogie on Document Understanding with Te… · code95.25
13LayoutXLMApr 2021LayoutXLM: Multimodal Pre-training for Multilingual Visu… · code95.21
14StrucTexTv2 (large)Mar 2023StrucTexTv2: Masked Visual-Textual Prediction for Docume… · code94.62
15Pre-trained LayoutLMDec 2019LayoutLM: Pre-training of Text and Layout for Document I… · code94.42
16DoPTADec 2024DoPTA: Improving Document Layout Analysis using Patch-Te…94.12
17DoPTA-HR (512×512)Dec 2024arxiv94.07
18DocXClassifier-BMar 2022papers-with-code · code94
19HEADoC-LargeOct 2025springer93.62
20StrucTexTv2 (small)Mar 2023StrucTexTv2: Masked Visual-Textual Prediction for Docume… · code93.40
21VLCDoCMay 2022VLCDoC: Vision-Language Contrastive Pre-Training Model f…93.19
22TransferDocSep 2023GlobalDoc: A Cross-Modal Vision-Language Framework for R…93.18
23DoPTA (224×224)Dec 2024arxiv92.96
24HEADoC-BaseOct 2025springer92.95
25Multimodal (ResNet50)Jan 2023Multimodal Side-Tuning for Document Classification · code92.70
26DiT-LMar 2022DiT: Self-supervised Pre-training for Document Image Tra… · code92.69
27Pre-trained EfficientNetJun 2020Improving accuracy and speeding up Document Image Classi… · code92.31
28Transfer Learning from VGG16 trained on ImagenetJan 2018Document Image Classification with Intra-Domain Transfer… · code92.21
29Multimodal (MobileNetV2)Jan 2023Multimodal Side-Tuning for Document Classification · code92.20
30DiT-BMar 2022DiT: Self-supervised Pre-training for Document Image Tra… · code92.11
31BEiT-BJun 2021BEiT: BERT Pre-Training of Image Transformers · code91.09
32Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50Apr 2017Cutting the Error by Half: Investigation of Very Deep CN… · code90.97
33AlexNet + spatial pyramidal pooling + image resizingAug 2017Analysis of Convolutional Neural Networks for Document I…90.94
34DeiT-BOSSMetaDec 2020Training data-efficient image transformers & distillatio… · code90.32
35Roberta baseJul 2019RoBERTa: A Robustly Optimized BERT Pretraining Approach · code90.06
far
1 row
#ModelOrgSubmittedPaper / codefar
01VisualWordGridOct 2020VisualWordGrid: Information Extraction From Scanned Docu…28.70
war
1 row
#ModelOrgSubmittedPaper / codewar
01VisualWordGridOct 2020VisualWordGrid: Information Extraction From Scanned Docu…18.70
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

5 steps
of state of the art.

Each row below marks a model that broke the previous record on accuracy. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · accuracy
  1. Apr 11, 2017Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet5090.97
  2. Jan 29, 2018Transfer Learning from VGG16 trained on Imagenet92.21
  3. Dec 31, 2019Pre-trained LayoutLM94.42
  4. Jun 16, 2020Cross-Modal97.05
  5. May 11, 2023EAML97.70
Fig 3 · SOTA-setting models only. 5 entries span Apr 2017 May 2023.
§ 04 · Literature

23 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies