RoDLA.

Chen, Zhang et al.open-sourceUnknown paramsDINO-based detector with InternImage backbone + channel attention blocks

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models. DINO-based architecture using InternImage backbone (ImageNet-22K pretrained) with channel attention and average pooling in encoder for perturbation-resistant features. 96.0 mAP on clean PubLayNet-val. CVPR 2024. arXiv 2403.14442.

§ 02 · Benchmarks

Every benchmark RoDLA has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	publaynet-val	Computer Vision · Document Layout Analysis	Overall	1.0%	#2/2	—	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area

Where RoDLA actually performs.

Computer Vision

benchmark

avg rank #2.0

§ 06 · Sources & freshness

Where these numbers come from.

cvpr-2024

result

0 of 1 rows marked verified.