Optical Character Recognition2020en

mldoc-zero-shot-english-to-spanish

Dataset from Papers With Code

Metrics:accuracy, cer, wer, f1
Current State of the Art

XLMft UDA

Unknown

96.8

accuracy

accuracy Progress Over Time

Showing 3 breakthroughs from May 2018 to Sep 2019

70.177.484.691.999.2May 2018Dec 2018Sep 2019accuracyDate

Key Milestones

May 2018
MultiCCA + CNN

From paper: A Corpus for Multilingual Document Classification in Eight Languages

72.5
Dec 2018
Massively Multilingual Sentence Embeddings

From paper: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

77.3
+6.7%
Sep 2019
XLMft UDACurrent SOTA

From paper: Bridging the domain gap in cross-lingual document classification

96.8
+25.2%
Total Improvement
33.5%
Time Span
1y 4m
Breakthroughs
3
Current SOTA
96.8

Top Models Performance Comparison

Top 6 models ranked by accuracy

accuracy1XLMft UDA96.8100.0%2MultiFiT, pseudo79.181.7%3Massively Multilingual Se...77.379.9%4MultiCCA + CNN72.574.9%5BiLSTM (UN)69.571.8%6BiLSTM (Europarl)66.768.9%0%25%50%75%100%% of best
Best Score
96.8
Top Model
XLMft UDA
Models Compared
6
Score Range
30.1

accuracyPrimary

Related Papers4

Other Optical Character Recognition Datasets