Codesota · Computer Vision · OCR · Fox (English subset, 600-1300 text tokens)Tasks/Computer Vision/OCR

OCR · benchmark dataset · EN

Fox — English subset (pages with 600–1300 text tokens).

English subset of the Fox benchmark for fine-grained multi-page document understanding (PDF page images + page-level annotations). The Fox benchmark was introduced in the paper "Focus Anywhere for Fine-grained Multi-page Document Understanding" (arXiv:2405.14295). This English subset (Fox-Page-En) contains PDF page images and OCR/annotation files drawn from the Fox benchmark for page-level evaluation. In the paper the authors report experiments on a selection of pages with 600-1300 text tokens (documents tokenized with the DeepSeek-OCR tokenizer, vocab ~129k) and state they selected 100 pages in that token range for a particular evaluation; the paper reports precision values for different numbers of vision tokens (e.g., 64 and 100) across token bins. The Fox project provides code and benchmark data via the project GitHub (https://github.com/ucaslcl/Fox). A convenient Hugging Face-hosted subset (EduardoPacheco/Fox-Page-En) is available for easy access; the HF subset page and associated issue note indicate this HF subset contains 112 English page samples (as maintained by the HF contributor). Use cases: page-level OCR/region-level OCR, region-level summarization/translation, and other fine-grained document understanding evaluations for LVLMs.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

Fox — English subset (pages with 600–1300 text tokens).

Best published scores.

Neighbouring benchmarks.

Have a score that beatsthis table?

Have a score that beats
this table?