Optical Character Recognition2020en
mldoc-zero-shot-english-to-russian
Dataset from Papers With Code
Metrics:accuracy, cer, wer, f1
Current State of the Art
XLMft UDA
Unknown
89.7
accuracy
accuracy Progress Over Time
Showing 3 breakthroughs from May 2018 to Sep 2019
Key Milestones
May 2018
BiLSTM (UN)
From paper: A Corpus for Multilingual Document Classification in Eight Languages
61.4
Dec 2018
Massively Multilingual Sentence Embeddings
From paper: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
67.8
+10.4%
Sep 2019
XLMft UDACurrent SOTA
From paper: Bridging the domain gap in cross-lingual document classification
89.7
+32.3%
Total Improvement
46.0%
Time Span
1y 4m
Breakthroughs
3
Current SOTA
89.7
Top Models Performance Comparison
Top 5 models ranked by accuracy
Best Score
89.7
Top Model
XLMft UDA
Models Compared
5
Score Range
28.9
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | XLMft UDA | 89.7 | Sep 2019 | |
| 2 | MultiFiT, pseudo | 67.83 | Sep 2019 | |
| 3 | Massively Multilingual Sentence Embeddings | 67.78 | Dec 2018 | |
| 4 | BiLSTM (UN) | 61.42 | May 2018 | |
| 5 | MultiCCA + CNN | 60.8 | May 2018 |
Related Papers4
Bridging the domain gap in cross-lingual document classification
Sep 2019Models: XLMft UDA
MultiFiT: Efficient Multi-lingual Language Model Fine-tuning
Sep 2019Models: MultiFiT, pseudo
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
Dec 2018Models: Massively Multilingual Sentence Embeddings
A Corpus for Multilingual Document Classification in Eight Languages
May 2018Models: BiLSTM (UN), MultiCCA + CNN