Document Image Classification2020en
rvl-cdip
Dataset from Papers With Code
Metrics:accuracy, cer, wer, f1
Current State of the Art
EAML
Unknown
97.7
accuracy
accuracy Progress Over Time
Showing 5 breakthroughs from Apr 2017 to May 2023
Key Milestones
Apr 2017
Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50
From paper: Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
91.0
Jan 2018
Transfer Learning from VGG16 trained on Imagenet
From paper: Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
92.2
+1.4%
Dec 2019
Pre-trained LayoutLM
From paper: LayoutLM: Pre-training of Text and Layout for Document Image Understanding
94.4
+2.4%
Jun 2020
Cross-Modal
From paper: Visual and Textual Deep Feature Fusion for Document Image Classification
97.0
+2.8%
May 2023
EAMLCurrent SOTA
From paper: EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
97.7
+0.7%
Total Improvement
7.4%
Time Span
6y 2m
Breakthroughs
5
Current SOTA
97.7
Top Models Performance Comparison
Top 10 models ranked by accuracy
Best Score
97.7
Top Model
EAML
Models Compared
10
Score Range
2.4
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | EAML | 97.7 | May 2023 | |
| 2 | Cross-Modal | 97.05 | Visual and Textual Deep Feature Fusion for Document Image Classification | Jun 2020 |
| 3 | DocFormerBASE | 96.17 | Jun 2021 | |
| 4 | LayoutLMV3Large | 95.93 | Apr 2022 | |
| 5 | LiLT[EN-R]BASE | 95.68 | Feb 2022 | |
| 6 | LayoutLMv2LARGE | 95.64 | Dec 2020 | |
| 7 | TILT-Large | 95.52 | Feb 2021 | |
| 8 | DocFormer large | 95.5 | Jun 2021 | |
| 9 | LayoutLMv3BASE | 95.44 | Apr 2022 | |
| 10 | Donut | 95.3 | Nov 2021 | |
| 11 | TILT-Base | 95.25 | Feb 2021 | |
| 12 | LayoutLMv2BASE | 95.25 | Dec 2020 | |
| 13 | LayoutXLM | 95.21 | Apr 2021 | |
| 14 | StrucTexTv2 (large) | 94.62 | Mar 2023 | |
| 15 | Pre-trained LayoutLM | 94.42 | Dec 2019 | |
| 16 | DoPTA | 94.12 | Dec 2024 | |
| 17 | DocXClassifier-B | 94 | DocXClassifier: High Performance Explainable Deep Network for Document Image ClassificationCode | Mar 2022 |
| 18 | StrucTexTv2 (small) | 93.4 | Mar 2023 | |
| 19 | VLCDoC | 93.19 | May 2022 | |
| 20 | TransferDoc | 93.18 | Sep 2023 | |
| 21 | Multimodal (ResNet50) | 92.7 | Jan 2023 | |
| 22 | DiT-L | 92.69 | Mar 2022 | |
| 23 | Pre-trained EfficientNet | 92.31 | Jun 2020 | |
| 24 | Transfer Learning from VGG16 trained on Imagenet | 92.21 | Jan 2018 | |
| 25 | Multimodal (MobileNetV2) | 92.2 | Jan 2023 | |
| 26 | DiT-B | 92.11 | Mar 2022 | |
| 27 | BEiT-B | 91.09 | Jun 2021 | |
| 28 | Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50 | 90.97 | Apr 2017 | |
| 29 | AlexNet + spatial pyramidal pooling + image resizing | 90.94 | Aug 2017 | |
| 30 | DeiT-BOpen Source Meta | 90.32 | Dec 2020 | |
| 31 | Roberta base | 90.06 | Jul 2019 |
far
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | VisualWordGrid | 28.7 | Oct 2020 |
war
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | VisualWordGrid | 18.7 | Oct 2020 |
Related Papers23
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment
Dec 2024Models: DoPTA
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Mar 2023Models: StrucTexTv2 (large), StrucTexTv2 (small)
Multimodal Side-Tuning for Document Classification
Jan 2023Models: Multimodal (ResNet50), Multimodal (MobileNetV2)
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Apr 2022Models: LayoutLMV3Large, LayoutLMv3BASE
DiT: Self-supervised Pre-training for Document Image Transformer
Mar 2022Models: DiT-L, DiT-B
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
Feb 2022Models: LiLT[EN-R]BASE
OCR-free Document Understanding Transformer
Nov 2021Models: Donut
DocFormer: End-to-End Transformer for Document Understanding
Jun 2021Models: DocFormerBASE, DocFormer large
BEiT: BERT Pre-Training of Image Transformers
Jun 2021Models: BEiT-B
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Apr 2021Models: LayoutXLM
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Feb 2021Models: TILT-Large, TILT-Base
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Dec 2020Models: LayoutLMv2LARGE, LayoutLMv2BASE
Training data-efficient image transformers & distillation through attention
Dec 2020Models: DeiT-B
VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach
Oct 2020Models: VisualWordGrid
Improving accuracy and speeding up Document Image Classification through parallel systems
Jun 2020Models: Pre-trained EfficientNet
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Dec 2019Models: Pre-trained LayoutLM
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Jul 2019Models: Roberta base
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
Jan 2018Models: Transfer Learning from VGG16 trained on Imagenet
Analysis of Convolutional Neural Networks for Document Image Classification
Aug 2017Models: AlexNet + spatial pyramidal pooling + image resizing
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Apr 2017Models: Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50