Model card
RoDLA.
Chen, Zhang et al.open-sourceUnknown paramsDINO-based detector with InternImage backbone + channel attention blocks
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models. DINO-based architecture using InternImage backbone (ImageNet-22K pretrained) with channel attention and average pooling in encoder for perturbation-resistant features. 96.0 mAP on clean PubLayNet-val. CVPR 2024. arXiv 2403.14442.
§ 01 · Benchmarks
Every benchmark RoDLA has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | publaynet-val | Computer Vision · Document Layout Analysis | Overall | 1.0% | #2 | — | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 05 · Sources & freshness
Where these numbers come from.
cvpr-2024
1
result
0 of 1 rows marked verified.