Model card
DiT-Base.
MicrosoftclassificationVision Transformer (self-supervised)
Document Image Transformer pre-trained on IIT-CDIP using masked image modeling. Arxiv 2203.02378.
§ 01 · Benchmarks
Every benchmark DiT-Base has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | tobacco-3482 | Computer Vision · Document Image Classification | accuracy | 84.1% | #7 | — | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 04 · Related models
Other Microsoft models scored on Codesota.
§ 05 · Sources & freshness
Where these numbers come from.
Label Errors in the Tobacco3482 Dataset
1
result
1 of 1 rows marked verified.