5,000 images with fine annotations and 20,000 with coarse annotations of urban street scenes.
3 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.
| # | Model | Org | Submitted | Paper / code | miou |
|---|---|---|---|---|---|
| 01 | EoMT (ViT-L) | — | Mar 2025 | Your ViT is Secretly an Image Segmentation Model · code | 84.20 |
| 02 | DINOv3 (7B) | — | Aug 2025 | DINOv3 · code | 81.10 |
| 03 | DINOv2 (ViT-g/14) | — | Apr 2023 | DINOv2: Learning Robust Visual Features without Supervis… · code | 81 |
Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.