DoTA (en->zh).

Name: DoTA (en->zh) Benchmark Results
Creator: Unknown
License: https://creativecommons.org/licenses/by/4.0/

DoTA (Document image machine Translation dataset of ArXiv articles in markdown format) is a large-scale dataset of document-image → translation pairs introduced for document image machine translation (DIMT). It was created from arXiv articles rendered in markdown format and is intended to evaluate translation of long-context, complex-layout document images (e.g., whole pages with tables/figures/sections) into markdown-formatted target text. The NAACL 2024 paper reports a filtered set of about 126K image–translation pairs; the authors also provide an unfiltered collection of ~139K samples in the public repository/dataset. The dataset includes multilingual content (source English and target Chinese for the en→zh subset used in evaluations; the dataset metadata indicates other language variants are present) and is distributed under an MIT license on Hugging Face (the Hugging Face dataset is gated and requires agreeing to access conditions).

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

Not enough data to show trend.

§ 02 · Leaderboard

Results by metric.

Only 1 model on this benchmark

Help build the community leaderboard — submit your model results.

COMET

COMET is the reported evaluation metric for DoTA (en->zh). Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for COMETverifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Source
01	HunyuanOCR (1B) dataset: DoTA (en->zh); task: 6	paper	83.48	N/A	Source ↗

§ 04 · Submit a result

Add to the leaderboard.

← Back to Machine Translation