Optical Character Recognition2025pl

CodeSOTA Polish OCR Benchmark

1,000 synthetic and real Polish text images with 5 degradation levels (clean to severe). Tests character-level OCR on diacritics with contamination-resistant synthetic categories. Categories: synth_random (pure character recognition), synth_words (Markov-generated words), real_corpus (Pan Tadeusz, official documents), wikipedia (potential contamination baseline).

Samples:1,000
Metrics:cer, wer, accuracy
Download

No benchmark results indexed for this dataset yet.

Contribute results on GitHub

Other Optical Character Recognition Datasets

CodeSOTA Polish Benchmark - Optical Character Recognition | CodeSOTA