English subset of the Fox benchmark for fine-grained multi-page document understanding (PDF page images + page-level annotations). The Fox benchmark was introduced in the paper "Focus Anywhere for Fine-grained Multi-page Document Understanding" (arXiv:2405.14295). This English subset (Fox-Page-En) contains PDF page images and OCR/annotation files drawn from the Fox benchmark for page-level evaluation. In the paper the authors report experiments on a selection of pages with 600-1300 text tokens (documents tokenized with the DeepSeek-OCR tokenizer, vocab ~129k) and state they selected 100 pages in that token range for a particular evaluation; the paper reports precision values for different numbers of vision tokens (e.g., 64 and 100) across token bins. The Fox project provides code and benchmark data via the project GitHub (https://github.com/ucaslcl/Fox). A convenient Hugging Face-hosted subset (EduardoPacheco/Fox-Page-En) is available for easy access; the HF subset page and associated issue note indicate this HF subset contains 112 English page samples (as maintained by the HF contributor). Use cases: page-level OCR/region-level OCR, region-level summarization/translation, and other fine-grained document understanding evaluations for LVLMs.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.